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WEIGHTED vs. UNIT SCALES 


EDWARD K. STRONG, JR. 
Stanford University 


Scoring test blanks is a routine, boresome job. Any discovery 
of a true short-cut would be a real contribution. Unfortunately 
in the elation of inventing a possible new procedure, the judg- 
ment is sometimes befogged and short-cuts are recommended 
which are distinctly less valid than the accepted routine. 

It is appropriate to ask the question: what criteria should be 
employed to determine the acceptability of a short-cut? The 
answer is not easy for it involves values which are incommeasure- 
able at the present time. 

Such values as reliability and validity, fudging, convenience 
and competition with other tests must all be considered. The 
short-cut should have approximately the same reliability as the 
regular procedure. But how much is ‘approximately’? Is 
.90 approximately .92? Is .86 approximately .88? Or, in the 
case of validity, how much decrease in validity is possible before 
reaching the invalid point? 

The responses to many paper and pencil tests can be fudged. 
Usually it is the items with the largest weights that are most 
likely to be so treated, for they are the items whick are most 
clearly indicative of what is to be measured. If we short-cut 
the scoring of an interest test, for example, by dropping out all 
the items with low weights we increase very greatly the possi- 
bility of fudging. This would affect the usefulness of the test 
in selection, for applicants for a job are prone to try and make a 
good impression, that is, respond as desired by the boss rather 
than as they personally feel. The short-cut might, however, be 
acceptable for guidance purposes as there would be little motive 


for fudging in such case. 
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Convenience is what we are most interested in when we try to 
find a short-cut and convenience is the major defense for any 
short-cut, even at the expense of reliability and validity. Con- 
venience may be thought of in terms of ease of. administration— 
a short test requires less time to give than a long one—and 
particularly in terms of ease of scoring. The shorter the test 
and easier it is to score, the less the cost to administer. Usually 
the budget is limited and the question arises shall we test one 
hundred persons on the best test or two hundred persons on a 
substitute test. ; 

This situation leads naturally to the further consideration by 
the owner of the test—shall I try to sell a good test to a few 
people or sell a poorer test to many people? Or shall I put on 
the market a fair substitute to keep people from using a still 
poorer test marketed by a competitor? How far should the 
profit motive be considered here? How can profit be evaluated 
against reliability and validity? 

When one tries to decide whether a short-cut is desirable or not 
one not only has to consider the above factors one at a time, but 
also to weigh them one against another. How much decrease 
in reliability and/or validity can be overlooked if the cost is 
reduced a third, a half, two-thirds? How much increase in 
fudging—which we probably will not discover—is pardonable if 
we decrease the cost appreciably? Where validity is unknown 
or only roughly estimated, how far can reliability be accepted as a 
measure of usefulness and, furthermore, how much decrease in 
reliability under such conditions can be permitted for the sake of 
greater convenience? 

We psychologists sometimes talk about ethics—usually with 
reference to the unethical practices of others who are injuring us. 
Ethics is surely involved in consideration of such a question as: 
How far should profit of the owner of a test and convenience of 
the user of the test be permitted to interfere with the goodness of 
the test? Put this way, the answer seems easy—not at all. 
But there is the complication that the convenient method means 
decreased cost and consequent wider availability of a still good 
test. How then? 

The writer has plagued some of his friends with these questions. 
No two have entirely agreed. If the reader has the answers, the 
writer, at least, would like to hear them. 
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The following discussion involves some of the values mentioned 
above as they pertain to the very practical problem of whether 
unit. scales should be used as substitutes for weighted scales in 
scoring the Vocational Interest Test. By unit scale we mean one 
in which the items are weighted 1, —1, or 0 instead of from 4 to 
—4. The advantage of unit scales is that they can be scored 
in half the time of weighted scales on the IBM Test Scoring 
Machine. * 

Years agg we investigated the possibility of using unit scales 
and came to the conclusion that they did not differentiate 
occupations as well as weighted scales.’ At that time the scales 
were based on criterion groups of about one hundred cases. 
Since then the criterion groups have been increased to about 
two hundred fifty for men’s scales and are being increased to 
about four hundred for the new and revised women’s scales. 
Apparently, increasing the size of the criterion groups has been 
accompanied by decrease in the superiority of weighted over 
unit scales, for more recent investigations do not support the 
clear cut superiority of weighted over unit scales reported earlier. 

Dunlap and his associates have recently published extensive 
data which they believe prove that unit scales give scores which 
approximate weighted scores sufficiently so that unit scales may 
be substituted for weighted scales'*** also® (pp. 626-633). 
Their proof rests upon three statistical findings, namely: that 
scores on the two scales correlate in the nineties with one another; 
second, that there is over seventy per cent agreement in the 
ratings on the two scales; and third, that such errors as occur are 
relatively unimportant. 

Let us review their findings very briefly and then consider 
their significance in the light of other measurements not men- 
tioned by them. ‘ 


Correlation between Weighted and Unit Scales.—Correlations 
reported by Dunlap and his associates between weighted and unit 
scores, range from .854 to .991. Medians of the coefficients in 
nine reports range from .940 to .976, with an average of these 
nine medians of .961. (The average of our own coefficients is 





* There is also a reduction in labor in scoring unit scales on the IBM 
Counting Sorter and in ordinary hand scoring. Our experience is that 
there is little or no reduction in time in hand scoring with Veeder Counters 
and none at all with the IBM Tabulator (Hollerith Machine),* (p. 623). 





a — —— 


— 











196 The Journal of Educational Psychology 


.945.) These coefficients look very high, but when expressed in 
terms of per cent of efficiency, i.e., per cent better than chance, 
they become seventy-two and sixty-seven per cent, respectively. 
On such a basis we cannot accept unit scales as equivalent to 
weighted scales. 

Percentage of Agreement in Ratings.—It is common procedure 
to convert raw or standard scores to letter ratings on the following 
basis for standard scores, where 50 equals the mean of the 
criterion group and 10 equals the standard deviation. 


Letter Rating Standard Scores 
A 45 and up 
B+ 40 to 44 
B 35 to 39 
B- 30 to 34 
C+ 25 to 29 
C 24 and below 


An A rating is defined as having the interests of the occupation, a 
C rating as not having the interests of the occupation, and 
ratings of B+, B and B— as probably having these interests but 
with decreasing chances from B+ to B—. In most cases only 
A, B+ and B ratings are considered to be significant. 

Dunlap’s calculation of agreement between scores between 
weighted and unit scales is made not in terms of scores, but of 
ratings. A summary of the data of Dunlap and his associates 
is given in Table 1, together with our own data. The first four 
sets of data agree very closely, and differ appreciably from those 
of Lester and Traxler. Our own data agree closely with the 
average of the other five except that we have nearly three per 
cent more 2— and 3— rating changes, which is a serious matter, 
since such changes usually induce a different interpretation of the 
person’s interests. 

Dunlap and his associates could not score a criterion group on 
its unit scales and determine critical scores for letter ratings 
directly from the mean and standard deviation of the criterion 
group. Instead they calculated critical scores of the unit 
scales from the scores of weighted scales by regression equations. 
Such calculated critical scores should give greater agreement 
between two sets of scores than would be obtained by using 
critical scores based on the scores of a criterion group. Com- 
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parisons with three samples totaling 920 cases show 2.9 per cent 
greater agreement between weighted and unit scores based on 
regression equations than between weighted and unit scores 
based on criterion groups. 


TABLE 1.—PERCENTAGE OF AGREEMENT IN RATINGS BETWEEN 
UNIT AND WEIGHTED SCALES 
Per Change Change Change 


cent of of of 
; Agree- one two three 
Researches Sex ment rating ratings ratings 


Peterson and Dunlap men 72.5 26.3 1.2 

Kogan and Gehl- 
a (ae 

Harper and Dunlapwomen 76.6 22.5 


1.0 0.1 
0.9 
Harper and Dunlapwomen 77.9 20.9 1.2 
0.3 
9 


24.8 


Lester and Traxler... men 67.7 32.0 , 
IS 5 a:ino'e: 40sec 73.8 25.3 0. 
Strong, see Table4.. women 71.2 24.9 3.6 0.2 


Since the norms for an interest scale should be based upon the 
scores of the criterion group or another similar group, and not 
upon any indirect method of estimating those norms, the calcu- 
lations of Dunlap and his associates should show approximately 
three per cent greater agreement between weighted and unit 
ratings than is warranted. This is not a great amount, but it 
needs to be considered when one is determining the relationship 
between two procedures. 

Significance of Errors.—How serious are the shifts in ratings 
which occur? Peterson and Dunlap say: 

These shifts, however, are for the most part not significant. 

It is of no importance if one method rates an individual C and 

the other C+, or one rates him A and the other B+. Counse- 

lors rarely attach much weight to scores less than B+; thus, 
the critical scores are those that shift between B and B+. Of 
particular importance are those cases where the individual has 
an original score of B+, but according to the simplified 
scoring, would be rated only B. If no favorable advice is 
given on a B rating, the individual’s attention is not called 
to the field. If, however, the true score is B and the simplified 
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score is B+, then slightly more emphasis is given to the 
occupation than is its due. This is not so serious as the failure 
of the counselor to mention the occupation® (pp. 272-3). 


Peterson and Dunlap’s Table IV shows that in 3.5 per cent of 
all ratings an original B+ shifts to B with the unit scoring system. 
Kogan and Gehlmann report 3.5 per cent, and Harper and 
Dunlap 2.3 per cent. Peterson and Dunlap conclude that 
‘advice will be in error only once in thirty-three timies.’’ These 
writers point out that if it is desired to eliminate such errors, all 
ratings of B on the unit scales may be rescored on the weighted 
scales. This would mean, according to their data, the rescoring 
of from ten to fifteen per cent of all scores, as there is that 
percentage of B ratings on their unit scales. 


PERCENTAGE OF AGREEMENT OF RATINGS NOT A 
SATISFACTORY MEASURE OF VALIDITY 


Table 2 gives the distributions of ratings of two hundred women 
students scored on five pairs of weighted and unit occupational 
scales. Table 3 gives similar data respecting four hundred 
twenty-five women librarians. These two sets of data may be 
expressed as follows: 


Students Librarians 


Shift of three Ratings upward........... 0.2 0.7 
Shift of two Ratings upward............ 2.6 10.9 
Shift of one Rating upward............. 21.4 26.3 
I, lid Sh areal it's tole aid o'e sb Vs 73.4 59.6 
Shift of one Rating downward.......... 2.2 2.4 
Shift of two Ratings downward......... 0.2 0.1 


A shift ‘upward’ means that the unit scale rating is higher than 
the weighted scale rating, e.g., a B rating on the latter becomes a 
B+ or A on the former. These data may be converted into the 
form employed by Dunlap in Table 1, by combining the upward 
and downward shifts, so that we would have here for students, 
73.4 per cent agreement, 23.6 per cent of l-rating shift, 2.8 per 
cent of 2-rating shifts and 0.2 per cent of 3-rating shifts. 

Two things are apparent in the above data. First, there is 
greater agreement between weighted and unit scores among 
students than among librarians, i.e., 73.4 and 59.4 per cent 
agreement, respectively, although both groups have been scored 
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on the same five pairs of scales. Second, most of the shifts in 
ratings are upwards, that is, higher ratings are obtained on unit 
scales than on weighted scales. 


TABLE 2.—DISTRIBUTION OF WEIGHTED AND UNIT SCALE 
RatTines or 200 StupENTs ON Five OccuUPATIONAL SCALES 





Unit Scal 
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TABLE 3.—DISTRIBUTION OF WEIGHTED AND Unit SCALE 
RatiInGs oF 425 LIBRARIANS ON Five OccUPATIONAL SCALES 











Unit Scal 
Weighted r 
Scale aA | B+! B | B-—| c+! Cc |Total 
A 22.0] 0.3 22.3 
B+ 46| 4.3] 0.4 9.3 
B 1.6| 46] 3.8] 0.4 10.4 
B- 2| 3.2} 4.3] 3.5] 0.4| 01| 11.7 
C+ 5| 2.6| 5.4| 2.7] .9] 12.1 
Cc 35| 7.4|23.3| 34.2 
Total | 28.41 12.9 | 11.1 | 12.8 | 10.5 | 25.7 | 100.0 


























These results raise three questions: (1) why does the agree- 
ment between the same weighted and unit scales vary with 
different groups of blanks; (2) is the variability between the two 
sets of scores attributal to the unreliability of the scales; and (3) 
why do unit scales give higher scores than weighted scales? 
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VARIABILITY IN AGREEMENT BETWEEN WEIGHTED AND 
UNIT SCORES 


That agreement between weighted and unit scales does vary 
greatly with different blanks is shown in columns 8 to 11 of 
Table 4 where we find per cent of agreement ranging from 26 to 
95.7 per cent. It is one thing to talk about agreement ranging 
from 67.7 to 77.9 per cent as Dunlap and his associates do: it is 
quite another thing to talk about substituting one procedure for 
another when you may have almost any possible degree of 
agreement between the two. 

The explanation for these wide variations in agreement between 
weighted and unit scale lies in the fact that the measurement of 
agreement is in terms of ratings and not scores. 

When ratings are compared it must be remembered that B+, 
B, B— and C+ ratings cover a range of five standard scores, 
equivalent to one-half a standard deviation of the criterion 
group. But A ratings include a range from 45 to about 70, and 
C ratings, from 24 to 0 and sometimes below that. When the 
statement is made, for example, that there are seventy per cent 
agreement and thirty per cent shift of one-letter rating between 
two scoring procedures, it means that there have been enough 
shifts within the range of 20 to 50 to constitute thirty per cent 
of the total, since shifts within the range of 0 to 20 would be 
from C to C and would not be counted, and similarly shifts within 
the range of 50 to 70 would be from A to A and would not be 
counted. In terms of scores, not ratings, over the whole range 
there would be more half-sigma shifts than reported and, conse- 
quently, less than seventy per cent agreement. All this may be 
illustrated by a comparison of weighted and unit scores on the 
Y.W.C.A. Secretary scale given in Table 5. Here we have, for 
example, ninety per cent agreement in letter ratings among 
students, but only 32.5 per cent agreement if measured in 
one-half-sigma steps over the whole range of scores. ‘The reason 
for the great discrepancy in these two measures is that 84.5 per 
cent of weighted and 80 per cent of unit scores are C ratings and 
all shifts between scores within the C-range are ignored when 
measuring agreement by ratings. 

The above situation explains the wide variation in percentage 
of agreement in ratings shown in Tables 2, 3 and 4, and the some- 
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what similar variations which we feel sure must exist between 
different groups within Dunlap’s own data. With the same 
average difference between scores from two scoring systems there 
will be widely varying percentages of agreement between ratings, 
depending upon the percentage of scores which have C+, B—, B 
and B+ ratings, or, in other words, upon the mean score of the 
distribution. 


TABLE 5.—PERCENTAGE OF AGREEMENT BETWEEN WEIGHTED 
4 
AND Unit Scores IN TERMS OF (1) ONE-HALF-SIGMA STEPS 
AND (2) LetreR RATINGS 











By one-half-sigma steps By letter ratings 
TCA ee MOAT ee toe. - 
ians | dents ians | dents 
Up 2 one-half-sigma 
| Re Pre 0.5 21.0 | 13.0 0.5 11.0] 1.0 
Up 1 one-half-sigma 
og Se ee eis 15.3 46.0 | 53.5 6.4 23.0; 8.0 
Agreement........ 60.9 30.0 | 32.5 83.7 64.0 | 90.0 
Down 1 one-half- 
sigma step...... 23.3 3.0} 1.0 9.4 2.0; 1.0 























Table 6 gives the maximum percentage of agreement between 
letter ratings when every score on one scale is shifted .35 sigma 
from the corresponding score on the other scale.* If, for example, 
the mean score of the distribution is 55, over half of the scores 
are above 50 and none of these A ratings can shift to B+ (a 
score of 40 to 44). Similarly when the mean score is 15, over 
half the cases are below 20 and, being all C ratings, cannot shift 
to C+. But when the mean score is 35, the bulk of the cases lie 
within the range of 20 to 50 and most of the shifts of .35 sigma 
means a change in rating. 

The data in Table 4 have been reassembled on the basis of 
mean score and are given in Table 6 so that comparison can be 





* A standard error of a single score of .35 is equivalent to a reliability 
coefficient of .88. Shifting half the scores upward and half downward 3.5 
standard scores is close enough for our purposes to what would occur if the 
scores were shifted according to normal expectancy. 
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made between the calculated and actual percentages of agree- 
ment. If the two sets of data are plotted, it is easily seen that 
they follow the same general course and for the most part the two 
curves overlap. The rank correlation between the 29 percentages 
of agreement and the calculated percentages of ‘maximum agree- 
ment’ is .72, which indicates that percentage of agreement is 
affected by the mean score of the distribution. 


TABLE 6.—MaAxXxIMUM AGREEMENT WHEN SCORES ON ONE SCALE 
ArE ALL SHIFTED .35 SIGMA FROM THE CORRESPONDING 
SCORES ON THE OTHER SCALE; ALSO THE PERCENTAGES 
oF AGREEMENT FROM TABLE 4 ACCORDING TO THE 
MEAN SCORES ON THE WEIGHTED SCALES 











Actual Percentage of Agreement 
\ According to Mean Score 
Mean | Maximum 
Score | Agreement N Mean | Per Cent Agreement 
Score (Table 4) 
55 83 .6 
50 71.3 7 50.0 87.2 
45 58.0 
40 47.9 
35 44.0 2 37.7 57 .0 
30 47.9 2 30.3 44.5 
25 58.0 7 27 .0 61.2 
5 22.9 66.9 
20 71.3 2 20.2 60.3 
15 83 .6 3 16.2 85.0 
10 88. 2 12.5 90.5 

















All this means that if one selects interest blanks of people 


who score low on a scale one will obtain higher percentages of 
agreement between their ratings than if one uses blanks of 
people who average 30 to 40 on the scale. Dunlap and his 
associates used the blanks of students who average low on most 
scales. They obtained relatively high agreement, as should be 
expected. But if one uses blanks of librarians, for example, who 
average 39 on the English teacher weighted scale, one obtains 
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only 46 per cent agreement between letter ratings on the weighted 
and unit scales. Our own data on students scored on seven pairs 
of scales give 70.6 per cent agreement, but the per cents range 
from 88.5 (mean score of 12.5) on Y.W.C.A. Secretary scale to 
63 per cent (mean score of 27.5) on old Librarian scale. 

Percentage of agreement between ratings is consequently a 
precarious method of comparing two scoring procedures, for the 
percentage obtained must be evaluated in terms of the mean 
score of the blanks used. 


RELIABILITY 


The reliability of unit scales is practically the same as that 
of weighted scales, i.e., .863 and .870, respectively, see Table 7. 


TABLE 7.—RELIABILITY OF WEIGHTED AND Unit ScALES 


Occupation Weighted Scale Unit Scale 
(a) Used in Table 4 
Librarian (old).............. . 864 . 899 
Librarian (revised)........... .888 .908 
Nurse (revised).............. .860 .819 
PII HOD cc cs cn ccc sce .899 . 850 
English Teacher............. .818 .800 
Home Economics Teacher. ... .896 .888 
Y.W.C.A. Secretary.......... .868 .875 
pS EEG A Se .870 . 863 
(b) Additional Scales 
SS, oc 4c Mb aeicsas .806 .800 
Laboratory Technician....... .903 .875 
| eee re .859 — 
Physician (revised).......... .916 .896 
Social Worle? . << .cascscess .789 .831 
Home Economics Teacher... . .897 -880 
Math-Science Teacher....... .873 .878 
RON oi. x FR abe CHAS .863 .856 


Our second question asked if the lack of agreement between 
weighted and unit scales was attributal to the unreliability of 


the scales? 
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In order to answer this question we have calculated the varia- 
tion in scores between the two types of scales when the reliability 
of the scales is .87. Such a reliability is associated with a stand- 
ard error of a single score of 3.6, which means that 68 per cent 
of obtained scores do not diverge from their corresponding 
estimated true scores by more than that amount. In one 
hundred cases there would be differences between the obtained 
and estimated true scores ranging from 10 to —10, with a mean 


TABLE 8.—PERCENTAGE OF AGREEMENT OF RATINGS BETWEEN 
WEIGHTED RATINGS AND (1) Unit RaTINGs AND (2) 
‘CALCULATED’ RATINGS 





Weighted vs. 
Unit Scores 


Weighted vs. 
Calculated Score 














vn Stu- | Aver- wal Stu- | Aver- 
bra- bra- 
‘ dents| age | . dents | age 
rians rians 
3 Ratings Up....... a ..0.1- 3.8 
2 Ratings Up...... 112.6] 3.0} 7.8] 1.0] 1.5] 1.2 
1 Rating Up........ 18.9 | 22.0 | 20.5} 10.4} 10.5 | 10.5 
Same Rating....... 65.3 | 70.0 | 67.7 | 79.2 | 76.0 | 77.6 
1 Rating Down.....|} 2.1] 4.0| 3.0} 8.4] 12.0] 10.2 
2 Ratings Down.... 1.0 5 
DifferencesinScores} 4.0} 4.2/| 4.11 3 4 31 
Sigma of Differences 3.95 3.85 























of 0 and standard deviation of 3.6. These one hundred differ- 
ences were recorded on cards which were thoroughly shuffled. 
Each weighted score was then assigned a ‘calculated’ score 
according to the amount on the card that happened to be turned 
up. For this purpose we used the weighted scores of twenty 
women students and twenty women librarians on five occupa- 
tional scales, providing us with two hundred scores and the same 
number of calculated scores. The percentage of agreement 
between weighted and unit scores are given in the first part of 
Table 8 and the percentages between weighted and ‘calculated’ 
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scores are given in the second part of the table. Two differences 
appear, namely: (1) the percentage of agreement is higher with 
calculated scores than with unit scores and (2) the distribution of 
differences centers about zero with calculated scores instead of 
about 4.1 with unit scores. On the other hand, the standard 
deviation of the differences is practically the same in both cases. 
Apparently the distribution of differences between weighted and 
unit scores is what should be expected from the unreliability of 
the scales, but unit scales give higher scores than weighted scales 
and this complication decreases materially the agreement between 
ratings. 

McClelland’ reports higher correlations between weighted and 
unit scales for the Bernreuter Personality Inventory than 
reliability coefficients for the weighted scales and concludes that 
the “‘shortened score is at least as good an index of the total score 
as the total score is of itself on a second occasion.””’ McClelland’s 
conclusion must be discounted, since Kempfer? reports much 
lower correlations between weighted and unit scales. Leaving 
aside the facts, what shall we say of the argument itself: is it 
sound? 

Two objections to the argument are apparent: first, the two 
sets of correlation coefficients are not comparable, and, second, 
reliability is not a substitute for validity. 

Statistics is a procedure whereby certain concepts are expressed 
by figures; thereafter the figures must always be interpreted in 
terms of what they signify. If in two separate investigations 
two different concepts are expressed by the figure 40, for example, 
we may not draw conclusions as to the relationship between the 
two concepts merely because they are both represented by 40. 
To do this is comparable to saying that the temperatures in two 
rooms is the same when a fahrenheit thermometer reads 40 in 
one room and a centigrade thermometer reads 40 in the other 
room. In the case of McClelland’s coefficients we have correla- 
tions between odd and even items, registering reliability, and 
between weighted and unit weights for all items, registering 
relationship between the two scales. The latter coefficients are 
spuriously high as respects the former for, in the former, responses 
to half the items are correlated with responses to the other half 
of the items, whereas in the latter responses to all the items with 
one weighting are correlated with responses to all the items with 


FO ree gr, Sa te 














wr 
ad a 


: 

















208 The Journal of Educational Psychology 


another weighting, the two systems of weighting being highly 
correlated with one another. If we could determine the degree 
of spuriosity we could express the coefficients of one system in 
terms of the other as we can with the two methods of measuring 
temperature. But until we can measure the relationship we 
cannot say a coefficient of .90 between weighted and unit scales 
is equal to a reliability of .90. 

The second objection to McClelland’s argument is that 
reliability is accepted in lieu of validity. In a test such as the 
Bernreuter with no good criterion to check against and, there- 
fore, no known validity, it is natural to emphasize reliability, as 
McClelland has done. The data in Table 8 show, however, that 
with approximately the same reliability appreciable differences 
in validity may occur.* We must not forget that correlation 
is a measure of rank-order and that it does not reveal whether 
one distribution scores higher or lower on a scale than another 
distribution. Where the amount of the score has attached to it 
a definite meaning, as in the case of interest scores, shifts in 
amount change the resulting interpretation even though rank- 
order of scores remains the same. 


VALIDITY OF WEIGHTED AND UNIT SCALES 


An occupational interest scale is based directly on the con- 
trasting interests of men in an occupation and men in general. In 
the absence of any other objective criterion we must accept such 
scales as measuring what they purport to measure. Although 
we do not know what are the true differences in interests between 
any two occupational groups, we can definitely prefer that scale 
which best differentiates the two groups, for that is what the 
scales were designed to do. 

Differentiation can be expressed here in terms of differences in 
scores and in terms of percentage of overlapping of distributions. 
Since the standard score of 50 is the mean of the occupation on 





* Unit scale scores differ from their companion weighted scores approxi- 
mately as “‘calculated’’ scores differ from their companion weighted scores 
(sigmas of differences of 3.95 and 3.85, respectively). But unit scale scores 
are 4.11 higher than weighted scores whereas “calculated” scores are only 
.31 higher. That is, unit scores are less valid by an amount equal to .38 
sigma of the distribution of scores of a criterion group. 
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its own scale, any score below 50 automatically registers the 
degree of differentiation in terms of standard deviations of the 
criterion, where a difference of 10 equals the standard deviation. 
In the case of our small sample the unit scale scores average 4.1 
higher than the weighted scale scores (Table 8) which means 
that the unit scales do not differentiate as well as weighted scales 
to the extent of .41 sigma. 

Similar data, but based on thirty-six groups, involving 9,863 
cases, are given in Table 4. Here the differences between means 
of weighted and unit scales range from 0.6 to 7.3, with an average 
of 3.2, which is one-third of the standard deviation of the criterion 
groups. 

A difference of 3.2 in mean scores can cause a considerable shift 
in ratings from B— to B, from B to B+, etc., the amount depend- 
ing upon the mean score of the group. Assuming the standard 
deviation is 10 on both scales, then we have the following shifts 
in ratings. 





Mean Scores Per cent shifts in ratings 





Weighted| Unit | B— toB |} BtoB+ |} B+ toA| Total 





40 43.2 11.4 14.1 13.6 39.1 
30 33.2 13.6 10.2 6.0 29.8 
20 23.2 6.0 3.0 1.0 10.0 




















Data on students who average low on most scales must yield 
rather small differences in ratings between weighted and unit 
scales, but data on certain occupational groups who score high 
on related occupational scales will have many more high ratings 
on unit scales than on weighted scales. 

Lack of agreement of ratings is to be explained to some degree 
by unreliability of the scales and by the lesser degree of differ- 
entiation of unit scales, but the very great differences in agree- 
ment between ratings ranging from 26.0 to 97.5 per cent (Table 4) 
are attributal to the distribution of scores. If the mean of the 
distribution is 20, there cannot be many shifts in ratings; if the 
mean is 40, there can be four times as many shifts. 
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In estimating the effect of using unit rather than weighted 
scales it must not be overlooked that the above is based on the 
average difference between them. In half of the cases the 
differences are larger than 3.2, ranging up to double that amount 
with resulting larger differences in shifts between ratings than 
given above. 


Percentage of Overlapping.—Another way of dealing with the 
above data is to calculate the percentage of overlapping between 
a group and the criterion group on both the weighted and unit 
scales. The data on twenty-nine such comparisons are given in 
the second section of Table 4. Here the standard deviations of 
each group are taken into account as the formula involves the 
differences in means divided by the average of the two sigmas. 
The resulting quotients are converted to percentage of over- 
lapping by reference to Tilton’s table. These percentages 
express the total overlapping, i.e., the percentage of the two 
groups that have the same scores. Thus, librarians overlap 
thirty-one per cent with physicians on the physician weighted 
scale and forty-seven per cent on the unit scale. This difference 
of sixteen per cent is the largest in the table; the smallest is 
zero per cent and the average is 5.2 per cent. Weighted scales 
differentiate these groups from the criterion group 5.2 per cent 
better than do unit scales. This is true on the average, but in 
ten of twenty-nine comparisons unit scales are less valid than 
weighted scales by six to sixteen per cent and in two comparisons 
the differences are twelve and one-half and sixteen per cent. 


Reassignment of Norms.—Since unit scales give on the average 
3.2 higher scores than weighted scales, cannot new norms be 
assigned to unit scales to compensate for this defect in validity? 

The differences between weighted and unit scores for different 
groups on several scales are given in Table 4. On the basis of 
these data it would appear likely that a different adjustment 
would need to be made not only for each scale, but also for each 
group scored on each scale if the adjustment was to be exact. 
This is impracticable, but it is possible to make an adjustment 
for each scale. Some groups would then still score too high or 
too low, but the average would give appreciably smaller differ- 
ences in mean scores between weighted and unit standard scores 
than recorded in the table. But what would this mean? 
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We made such an adjustment so that women students would 
obtain mean scores equivalent to weighted scores on the Y.W.C.A. 
Secretary scale. When Y.W.C.A. Secretary blanks were scored 
on this basis there resulted a mean score of 46.9 instead of 50.0. 
The score of 50 on the adjusted basis doesn’t equal the mean of 
the criterion group, but 3.1 standard score above the mean. 
So we are back where we were at the start—unit scores are still 
higher than weighted scores, differentiating occupational groups 
from one apother less well than weighted scores. 

When Dunlap calculated norms for unit scores on the basis of 
regression equations he was making the kind of adjustment 
discussed here. It was natural to suppose that a 30 on his 
unit scale was the same as 30 on our weighted scale. If he had 
scored criterion group blanks on his unit scales he would have 
discovered that such scales do not give mean scores of 50 for the 
criterion group but something less than 50, and consequently 
different meanings would need to be attached to his unit scores 
from those attached to our weighted scores. 


DIFFERENCES IN COUNSELING BASED UPON WEIGHTED AND UNIT 
SCALES 


From the pragmatic point of view we could ignore all of the 
above statistics if we found that the same counseling resulted 
from unit scores as weighted scores. Even Dunlap admits there 
are differences here but not great enough, in his judgment, to 
effect the greater convenience of using unit scales. Is his esti- 
mate of the amount of difference in counseling adequate? 

We cannot accept the simple, mechanistic manner of inter- 
preting interest scores which Dunlap and his associates uphold 
(referred to above). In counseling, apparently only A and 
B+ ratings receive any attention from them whereas we find 
lower ratings contribute to the total interest pattern and often 
influence the final occupational choice. (Chapter 17)* .(Low 
ratings should always be noted, for they indicate what occu- 
pations the man should not enter more surely than high ratings 
indicate what he should enter.) We cannot believe, moreover, 
that these writers would give the same advice to a boy undecided 
between engineering and medicine who had the following scores 
on weighted scales 
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and these scores on unit scales 


ee ae de Cae 4a path see 48 <A 
Re es oes sera ca | piece ons 43 B+ 


and yet shifts of this amount occur. But apparently shifts of 
this sort are of no concern to them, as they only discuss those 
from B+ to B. 

The discrepancies between weighted and unit scores that need 
to be noted are increases or decreases in score within the range of 
B— to A ratings and the changes in rank order of scores cause by 
some scores increasing and other decreasing. Shifts from B— 
to C and the reverse need not concern us. Discrepancies cannot, 
however, be adequately measured by counting only the number 
of shifts from B+ to B. 

The only feasible way to estimate the differences between 
weighted and unit scores is to compare the counseling that 
would result from each. Unfortunately, we are able to make 
such comparisons on the basis of only six pairs of scales, since 
we have developed unit scales for only six of the eighteen occu- 
pational interests of women. 

Such estimates are largely subjective. Because of this we 
have rated the records of the first twenty students and the first 
twenty librarians in our files in three different ways on three 
separate occasions. The results have agreed surprisingly well. 
In the third procedure weights of one-quarter, one-half and one 
were assigned to each case according to the degree of difference 
in counseling that results. Examples are given below so that 
the reader may check our estimates. Data from weighted 
scales are given in the first column, data from unit scales in the 
second column. 


Librarian #1. Confusion, 14. 
Librarian 58 A 57 A 
English Teacher 41 B+ 48 A 


English Teacher should be given more serious consideration 
with the unit scale scores than with the weighted scores. 
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Librarian #7. Confusion, 4. 
Librarian 43 B+ 43 B+ 
Nurse 41 B+ 43 B+ 
Physician 29 C+ 37 B 


Nurse is a better choice in the second case than the first case 
because it is better supported by the related physician interest 
in the second case than the first. 


Student #3. Confusion, 4. 
Librarian 33 B- 38 B 


As these are in both cases the highest score they must be 
given undue consideration. The former is not much to rely 
upon, whereas under the circumstances the latter is some indi- 
cation of the students’ bent. 


Librarian #11. Confusion, 4 
Librarian 44 B+ 43 B+ 
Y.W.C.A. Secretary 43 B+ 51 A 
Nurse 36 6B 37 2B 
English Teacher 34. B- 36 B 


A tie between Librarian and Y.W.C.A. Secretary becomes a first 
choice for Y.W.C.A. Secretary and second choice for librarian. 


Student #8. Confusion, 4 
Home Economics 38 B 34 B- 
Librarian 31 B- 35 B 
English Teacher 27 C+ 30 B 
Physician 22 C 33 B- 


Although these are low scores, they are her highest scores. Home 
economics drops from first choice to second choice and where 
ratings are alone considered, not scores, it drops out of the 


running. 
Librarian #6. Confusion, 1. 
Librarian 49 A 47 A 
English Teacher 42 B+ 46 A 
Y.W.C.A. Secretary 41 B+ 48 A 
Nurse 35 B 40 B+ 
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Librarian is first choice with weighted scales and is tied with 
two other interests with unit scales. Similarly Y.W.C.A. 
Secretary is third choice with weighted scales and is tied with 
two other interests with unit scales, having actually the highest 
score of all. Nurse would be disregarded in the first case, but 
should receive consideration in the second case. (A confusion 
weight of 1 may be too high in the light of the other examples, 
but in the light of actual counseling situations the writer is 
mindful of the different conclusions the individual might reach 
both in terms of the scores and of her own background of experi- 
ence and possibilities of further education.) 

Using such confusion scores as the above we obtain a total of 
4.75 upon twenty librarians and 3.00 upon twenty students or 
appreciable confusion in one of every six cases. Even if the 
weights were cut in half, the proportion of one in twelve would 
be still very considerable. Our finding differs from that of 
Dunlap’s ‘one in thirty-three’ in that we have considered all 
pertinent scores and he has taken account of only the shifts 
from B+ to B. 


SUMMARY 


Comparison of unit and weighted scales has been made in 
terms of six measures. Dunlap and his associates demonstrated 
that the correlation between scores on the two scales is .961, 
slightly higher than our own average. But even that high 
coefficient yields only seventy-two per cent better than chance 
agreement, which means that the two types of scales can give 
appreciably different results. They based reliance primarily 
upon the percentage of agreement between ratings on the two 
scales. The average of five of their sets of data agrees closely 
with our own, except that we have 3.7 per cent shifts of 2- and 
3-ratings compared with their 0.9 shift of 2-ratings. It is 
extremely important to note that, although the average per- 
centage of agreement in ratings may be 72.6, such agreements 
range all the way from 26.0 to 97.5 per cent. 

Such wide variations in percentage of agreement of ratings is a 
refiection of the mean of the distribution of the group. If the 
mean is low, there can be few B and A ratings and, hence, few 
shifts and there must be high agreement because most of the 
ratings are C in both cases. Dunlap’s data are based on the 
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records of students who score low on most scales. Data based 
on librarians, on the other hand, who score fairly high on the 
physician scale, give only twenty-six per cent agreement in 
ratings and 39.5 per cent of 2- and 3-rating shifts. Percentage 
of agreement of ratings is accordingly a very precarious measure 
to use in this connection. 

The third measure of similarity between weighted and unit 
scales is reliability of scales where we find no difference between 
the two. 

The fourth measure compares mean scores on the two scales, 
and the fifth measure compares the percentage of overlapping 
between a group and the criterion group on both the weighted 
and unit scales. Here there is good evidence that there is, on 
the average, better differentiation with weighted than with unit 
scales and that in some cases the differences are striking. 

The sixth measure concerns differences in counseling based on 
weighted and unit scale scores. Dunlap concludes that in only 
one case in thirty-three would there be a shift in ratings from 
B+ to B and that only such shifts are of concern here. We, on 
the other hand, emphasize not merely the one, two or three 
highest scores, but the total pattern of interest scores. Shifts 
within the range of B— to A ratings and the reverse may affect 
the pattern, and, when they do, they must be considered. On 
such a basis unit scale scores will lead to different counseling 
from weighted scale scores in from one-sixth to one-twelfth of 


the cases. 
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RATE OF WORK IN READING PERFORMANCE 
AS MEASURED IN STANDARDIZED TESTS* 


MILES A. TINKER 


University of Minnesota 


The relation between speed of reading and comprehension 
continues to receive attention, partly because unequivocal 
experimental results are not yet available in the field. The 
variation in results, as pointed out by Tinker*® and later by 
Blommers‘ and Lindquist,? are due largely to the techniques of 
measurement used and to differing concepts of speed and of 
comprehension. In the early studies, scores on speed of reading 
tests were correlated with scores on reading comprehension tests. 
After Tinker’s criticism‘ of this technique, other methods have 
been employed. In 1939 Tinker® put forward the view that 
speed of reading means speed of comprehension. He said: ‘It 
would seem that the adequate technique for discovering the 
true relation between speed and comprehension in reading is to 
measure rate of work and comprehension on the same or strictly 
comparable material in each specific reading situation.’”’ In 
1944, Blommers and Lindquist? have adopted this basic concept. 
In studying the relation between rate of comprehension and 
comprehension in reading they say that ‘A third criterion . . . 
is that the rate and comprehension scores both be based on the 
same materials, or on materials equivalent in content and read 
for equivalent purposes.” 

In an initial study from this point of view, Anderson and 
Tinker! discovered a correlation of —,80 between rate of work 
(speed of reading) and power of comprehension. The Iowa 
Silent Reading Test (Advanced) was used with university 
sophomores as subjects. At that time it? was pointed out that 
this reading test was relatively easy for the readers employed. 
The writers asked, ‘‘ Would the correlations have been different 
if a more difficult test had been used?” In another investigation, 
employing various reading tests, Tinker® found lower correlations 
between rate of work and power of comprehension for ‘difficult’ 
than for ‘easier’ reading tests. Questions concerning the inter- 
pretation of this apparent trend were raised. 





* The expense of this study was met by a research grant from the Graduate 
School, University of Minnesota. 
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A more adequate check of the question raised by Anderson and 
Tinker! concerning the effect of difficulty of reading material on 
the relation of rate of work to power of comprehension would 
be to use the same test (Iowa Silent Reading Test: Advanced) 
with less mature readers and compare the findings with those 
obtained in the earlier study. This has been done in the present 
study. ‘The purpose, then, is to determine the relation between 
rate of work (speed) and comprehension in reading for a standard- 
ized reading test and compare the results with those obtained 
from more mature readers. 

The standardized test used was Form A of the revised Iowa 
Silent Reading Test: Advanced. Only the first five parts and 
the total of these were employed. All subtests yield comprehen- 
sion scores. This Advanced Test is designed for high-school 
students and college freshmen. 

The subjects in the experiment were one hundred high-school 
freshmen. Each student was tested individually. The standard 
time limits listed for parts of the test were used. Empirical 
check revealed that only a few of the fastest workers just about 
completed the tests with these time limits. The readers were 
instructed to work rapidly and consistently, but not to sacrifice 
accuracy for speed. Each subject was allowed to work on a 
subtest until the standard time had elapsed. At that point he 
was interrupted and a line drawn across the page below the last 
item attempted. Instructions were then given to complete the 
test and when the last item was finished the total time required 
for the whole subtest was recorded. 

Four scores were derived from the data: (1) The number of 
items done correctly in the standard time. Following customary 
usage this is designated as the ‘power of comprehension score.’ 
(2) The number of items done correctly in unlimited time. 
Again following usage, this is termed the ‘level of comprehension 
score.’ (3) The number of items attempted in standard time. 
This is a measure that has sometimes been used as a rate of work 
score. (4) The total time taken to complete the whole subtest. 
This yields a ‘rate of work score’ for the material read. It is 
probable that the term ‘rate of work’ is more accurate in this 
situation than ‘rate of comprehension.’ 

Blommers and Lindquist? have criticised this rate of work score 
on the grounds that it includes both rate on items that are done 
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incorrectly as well as those done correctly. They derived a rate 
of work score from exercises done correctly (adjusted rate of 
comprehension score), and employed this as their criterion of 
rate of comprehension. They claim that a rate score based upon 
all reading done (correct plus incorrect exercises) is not valid. 
This claim is not substantiated by their results. Their F-rate 
score, rate of work on exercises done incorrectly, has a reliability 
of .86. It correlates .77 with their criterion of rate of comprehen- 
sion. Although this coefficient may be significantly lower than 
their validity coefficient of .93, nevertheless, the correlation of 
.77 does indicate a fairly close correspondence between rate of 
work on incorrect exercises and rate on correct exercises. Blom- 
mers and Lindquist? also give the correlations between their 
criterion (adjusted rate score) and their WT-Score which is the 
time required to complete the test (working-time score). The 
latter includes both time spent on exercises done successfully 
and exercises done unsuccessfully. The correlation coefficient 
obtained is .95. Reliabilities of the two measures are .81 and 
.84. Apparently the two rate scores are equivalent. Similarly, 
the correlation between their criterion (adjusted rate score) and 
their A-Score (derived from time scores on all exercises) was .96. 
As a matter of fact these correlations of .95 and .96 are slightly 
higher than that obtained by Blommers and Lindquist between 
their adjusted rate score (criterion) and the S-rate score which is 
derived from only those exercises correctly solved. If one 
accepts these authors’ criterion of rate of comprehension, there- 
fore, the rate of work or rate of comprehension measure used in 
the present experiment has excellent validity. Apparently 
relative rate of work is the same whether it is based only upon 
exercises correctly done or on all items attempted irrespective 
of accuracy of response. Furthermore, in the practical measur- 
ing situation, it is either impractical or unfeasable to separate 
rate of work in reading which accomplishes one hundred per 
cent of comprehension as determined by an arbitrary standard 
from rate of work which fails by varying degrees from reaching 
this arbitrary degree of comprehension. 

It may be noted in passing that Blommers and Lindquist? 
misquote Tinker® by partial omission in their criticisms of the 
latter’s paper. They state, ‘‘ However, Tinker used as a measure 
of rate the number of exercises attempted (right and wrong) in a 
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limited time.”’ It happens that Tinker also employed time taken 
to complete the test as a measure of rate. His conclusions were 
based upon data derived from use of both measures. 

In the present experiment comprehension is defined in terms of 
how it is measured in the Iowa Silent Reading Test: Advanced; 
that is, the number of items completed correctly within a set time 
limit is assumed to be the power of comprehension score. Sim- 
ilarly, the number of items completed correctly when an oppor- 
tunity to attempt every item is provided, is assumed to be the 
level of comprehension score. Although some may not agree 
with the method of measuring power or level of comprehension in 
a particular test, that question need not be considered here. 

The reliabilities of the power of comprehension scores cited 
for the Iowa Silent Reading Test: Advanced are relatively high 
when computed by the chance-half method. For the subtests 
the coefficients range from .59 to .95; for the total comprehension 
score, .95 to .96. The basic data of this investigation are given 
in Table 1. The mean number attempted in standard time is 


TABLE 1.—MEANS AND SD’s For Iowa SILENT READING TEST: 
ADVANCED Form A 
N = 100 High-School Freshmen 











No. Correct No. At- Score in | Total Time 
Stand. tempted Unlimited in 
. Standard , 
Test Time Ti Time Seconds 
ime 
M |SD} M;SD!} M |SD/] M SD 
1. Paragraph Meaning..| 25.7*/11.2) 18.7) 5.3) 39.9*10.5) 936.6/252.5 
2. Word Meaning....... 36.7 | 8.7) 55.8/10.6) 41.0 | 7.9) 541.4/165.2 
3. Paragraph Organiza- 
tion.................| 8.1% 3.4] 25.1] 7.0) 11.3% 3.9] 507.1/182.0 
4. Sentence Meaning....| 18.2 | 7.7| 33.2} 7.2) 18.6 | 7.9] 271.1] 78.0 
5. Location of Informa- 
tion.................-| 15.9*| 5.4] 19.5] 4.3) 21.0*| 6.2] 478.8)126.0 
Total Score...........|104.6 |28.8/152.4/26.3/131.8 |28.4/2735.1/703.3 





























* Weighted Score. 


well below the number of items in each subtest. Differences 
between score in standard time and score in unlimited time show 
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that on the average, level of comprehension is not achieved in 
standard time except possibly in Subtest 4. 

The correlations between rate of work and comprehension are 
given in Table 2. The correlations between number attempted 
in standard time and time to complete the test—Row (e)— 
reveal that the number attempted in standard time is a fair 
measure of rate of work with the exception of Subtest 4, which 


TABLE 2.—CORRELATIONS BETWEEN RATE OF WORK AND 
‘COMPREHENSION’ MEASURES ON THE IOWA SILENT READING 
Test: ADVANCED Form A 
N = 100 High-School Freshmen 





r for subtests and total 





Row Measures Compared 
1 2 3 4 5 | Total 





(a) | Score stand. time vs. no. at- 
tempted stand. time........ 
(b) | Score stand. time vs. score un- 


.80} .63) .47) .29) .43 .55 


OT 3 ea as eae .72)| .88) .68| .97) .81 84 
(c) |Seore stand. time vs. total 

Cee? suc). sha sivininais < ai eke — .59| — .56) — .32| — .23)—.41) —.51l 
(d) | No. attempted stand. time vs. 

score unlimited time........| .23) .26;—.11) .20)—.06 .09 
(e) | No. attempted standard time 

eG a PA ae — .78| — .90) — .80) — .45)—.89| . —90 
(f) | Score unlimited time vs. total 

time*....................-)—-O8}—.21] .24;—.10} .04| —.03 


























* Total time refers to total time in seconds to attempt all items on a test. 


deals with sentence meaning. The latter is only —.45, while 
the remaining coefficients range from —.78 to —.90. Neverthe- 
less, greater emphasis should be placed upon total time than 
upon number attempted in standard time as a rate of work 
measure. 

The power of comprehension score correlates fairly high with 
level of comprehension. This is shown by comparing score in 
standard time with score in unlimited time—Row (b). The 
coefficients range from .68 to .97. The two subtests in which 
power and level show least correspondence are on paragraph 
meaning and paragraph organization where the coefficients are 
.72 and .68, respectively. The very high correlation of .97 for 
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subtest 4, which deals with sentence meaning, is conditioned by 
the fact that the readers had pretty much reached their level of 
comprehension within the standard time limit. For the total 
test, correlation between power and level of comprehension was 
.84. 

In Row (d) of Table 2 are shown the correlations between 
power of comprehension and number of exercises attempted in 
standard time. For the subtests the coefficients range from 
—.11 to .26; for the total test, r = .09. Obviously there is 
little or no relationship present. When score in unlimited time 
is correlated with total time—Row (f)—to show the relation 
between level of comprehension and rate of work, the coefficients 
for the subtests range from —.21 to .24; for total test, r = —.03. 
Again there is little or no relationship evident. 

When score in standard time is correlated with number 
attempted in standard time—Row (a)—the coefficients for 
subtests range from .29 to .80; for the total test, r = .55. Thus, 
except for subtest 1, there is only a small to moderate degree of 
relationship between power of comprehension and number of 
exercises attempted within a set-time limit. 

The correlations between score in standard time and total 
time yield the relationship between power of comprehension and 
rate of work. They are shown in Row (c) of Table 2. Coeffi- 
cients for the subtests vary from —.23 to —.59; for the total test 
r = —.51. Thus there appear to be, under the conditions of 
this experiment, significant correlations ranging from a slight 
relationship to a moderate one between rate of work and power 
of comprehension. The relationship is considerably less for 
paragraph meaning and word meaning. 

Inter-correlations between the subtests for each type of 
scoring are given in Table 3. Power of comprehension (score 
in standard time) varies considerably from subtest to subtest. 
The coefficients range from .24 to .61 with a median of .46. 
Level of comprehension (score in unlimited time) shows a 
similar trend. The coefficients range from .28 to .62 witha 
median of .48. This is also true for number of items attempted 
in standard time. For rate of work (total time to complete 
test), however, the picture is somewhat different. The coeffi- 
cients range from .56 to .81 with a median of .72. The three 
lower correlations (.56, .57 and .67) occurred between paragraph 
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comprehension vs. paragraph organization, sentence meaning 
and location of information. We have evidence from these 
intercorrelations that there is a rather prominent tendency 
toward a general relative rate of work for the subtests of the 
Iowa Silent Reading Test: Advanced. 


TABLE 3.—INTERCORRELATIONS OF First Five SUBTESTS OF 
Iowa SILENT READING Test: ADVANCED FOR EacH 
TYPE OF SCORING 
‘ N = 100 High-School Freshman 


Measure Range of r’s Median r 
Score in standard time................ .24 to .61 .46 
Score in unlimited time............... .28 to .62 .48 
Number attempted in standard time.... .30 to .66 .48 
Total time to complete test............ .56to .81 .72 
DISCUSSION 


In studying the relation between speed and comprehension in 
reading situations, the terminology employed should be carefully 
defined. Earlier workers claimed to be investigating the 
relationship between speed of reading and comprehension in 
reading. Speed of reading, however, was measured in various 
ways and views concerning the relation between rate and compre- 
hension were contradictory.'* As pointed out by Anderson 
and Tinker,' the important problem is the amount of relationship 
between rate of comprehension and degree of comprehension in a 
specific reading situation. Rate of comprehension was in terms 
of time to complete the reading task and comprehension in 
terms of exercises completed correctly within set time limits. 
This technique was also employed by Tinker® in a later study. 
He laid down the principle that rate of comprehension and degree 
of comprehension should be measured on the same or on strictly 
comparable material. Blommers and Lindquist? have accepted 
this principle, but have insisted that rate of comprehension be 
based exclusively upon those exercises in which comprehension 
reaches the level set by the authors (i.e., exercises answered 
correctly) rather than upon time to complete the test. Although 
Blommers and Lindquist present data to show that rate of work 
on ‘incorrectly’ done exercises is not identical with that on 
‘correctly’ done exercises, they also show that rate of work for 
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total test is equivalent to rate of work on ‘correctly’ done 
exercises. Their findings may be accepted as validating rate of 
work on the total test as a rate of comprehension score. This 
finding facilitates discussion of relationships between rate and 
comprehension in reading in practical situations where one is 
concerned with time spent on a reading task in relation to the 
degree of comprehension achieved. Nevertheless, it is more 
accurate to designate total time spent on a reading task as rate 
of work rather than rate of comprehension. 

Although the number of exercises attempted during a set-time 
limit, which permits the fastest workers to just about complete 
the test, is intimately related to rate of work in terms of time 
taken to complete the whole test (r = —.90), it is possible that 
the number of exercises attempted in standard time is not entirely 
adequate as a measure of rate of work or rate of comprehension. 
The relationship between number attempted in standard time 
and total time fluctuates considerably from one kind of reading 
to another. 

Conclusions concerning the relation between rate of work and 
comprehension in reading must be confined to the specific reading 
situation investigated. This means that the reading level of the 
subjects as well as the kind of reading material used must be 
considered. 

In this investigation high-school freshmen read the Iowa 
Silent Reading Test: Advanced. The test is designed for high- 
school and college students. The plan was to employ a reading 
situation that would be fairly difficult for the readers. In a 
previous study,' university sophomores found this test rather 
easy. 

As noted above, the correlations between rate of work and 
power of comprehension in this study ranged from —.23 to —.59 
for the subtests and r was —.51 for the total score. These coeffi- 
cients reveal a tendency for the faster readers to comprehend 
more. Although the coefficients are not high, they are sub- 
stantial except for parts 3 and 4 of the test. In the earlier 
investigation! where university sophomores read the same test, 
the correlations between rate of work and power of comprehension 
ranged from —.48 to —.85 for the subtests and r was —.80 for 
total score. Again parts 3 and 4 yielded the lower correlations. 
For the other parts and for the total test there was, however, a 
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marked tendency for the faster readers to comprehend more. 
It is reasonable to assume that the Iowa Silent Reading Test: 
Advanced was a more difficult reading situation for the high- 
schoo] freshmen than for the university sophomores. It follows, 
then, that the relation between rate of work and power of 
comprehension is determined in some degree by the level of 
difficulty of the reading material. In other words, in relatively 
easy reading this relation is fairly high, but in more difficult 
material, the relation is still significant but only low to moderate 
in size. ‘* 

Blommers and Lindquist? found the relation between rate of 
comprehension and comprehension to be approximately .30. 
They used specially prepared reading material with high-school 
juniors and seniors as subjects. Judging from the sample cited 
the material required a single response for each exercise. There 
could be considerable comprehension of their material and a 
reader still miss the response required to reach the standard set 
by the authors. As suggested by Tinker,’ the nature of the 
material and the type of response required of the reader may 
influence the relation between rate of comprehension and power 
of comprehension. 

The failure in this study to discover any significant correlation 
between rate of work and level of comprehension(comprehension 
score in unlimited time) is of interest. This finding appears to 
be due partly to the fact that some of the subjects were at or 
nearly at their level of comprehension at the end of the standard 
time limit. Others improved their scores markedly with 
extended time. In spite of such variations, however, power of 
comprehension and level of comprehension were fairly closely 
related. 

How are the different findings for relations between speed of 
reading and comprehension to be evaluated? ‘The earlier 
results, as analyzed by Tinker‘ may be dismissed as not valid 
because they are derived from inadequately conceived experi- 
ments. Sample conclusions and views from more recent reports 
follow: (1) Blommers and Linquist,? “the relationship between 
rate of reading comprehension and power of reading comprehen- 
sion is significant but low. . . .”” (2) Tinker,® “‘Thedata warrant 
the conclusion that there is an intimate relationship between 
speed and comprehension in reading when the textual material 
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is within the reader’s educational experience.” (3) Stroud,? 
notes that educational psychologists have been teaching that 
a ‘moderately high correlation exists between reading rate and 
comprehension.” (4) In the present experiment a medium sized 
correlation was found between rate of work and power of compre- 
hension in reading. 

Examination of the backgrounds from which these views were 
derived indicates that each view has a degree of validity. There 
are many reading skills which are somewhat independent of each 
other. The relationship between rate and comprehension need 
not be the same in any two reading situations if there are differ- 
ences in textual content, nature of response required from reader, 
purpose for which the material is read, and difficulty. Further- 
more, the conclusions in any study probably can be applied 
only to the reading of groups like the one measured. 

In general the trend of results from various sources indicate 
that there is a significant relationship between rate and compre- 
hension in reading. This tendency of good comprehension to 
accompany fast reading varies from a slight relationship to a 
moderately high relationship. Factors affecting the size of this 
relationship have been discussed above. 


SUMMARY 


1) The purpose of this investigation is to determine the 
relation between rate of work and comprehension in reading and 
to compare the results with those obtained from more mature 
readers. 

2) The reading material consisted of the first five subtests of 
Form A, Revised Iowa Silent Reading Test: Advanced. 

3) The readers were one hundred high-school freshmen. 
They were tested individually. 

4) Scores derived from the data were: (1) Power of compre- 
hension, i.e., the number of exercises done correctly in the stand- 
ard time limit. (2) Level of comprehension, i.e., the number of 
exercises done correctly in unlimited time. (3) The number of 
exercises attempted in standard time. (4) Rate of work, i.e., 
the time taken to complete the test. 

5) Comprehension is defined in terms of how it is measured 
on the test used. 
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6) Rate of work, as here defined, yields a relative score which 
appears to be equivalent to Blommers and Lindquist’s criterion 
of rate of comprehension. The correlation coefficient between 
them is .95. 

7) The correlations between rate of work and power of 
comprehension for the subtests ranged from —.23 to —.59; 
for the total score, r = —.51. For these readers and for the test 
used, therefore, there is a small to moderate degree of correlation 
between rate of work and power of comprehension. 

8) Fot more mature readers (university sophomores), while 
reading the same material, the correlations between rate of work 
and comprehension ranged from —.48 to —.85; for total score, 
r = —.80. Since one may safely assume that the test material 
was more difficult for the high-school freshmen than for the 
university sophomores, it may be concluded that the relation 
between rate of work and power of comprehension becomes less 
when the reading material becomes more difficult. 

9) The correlations between rate of work and level of com- 
prehension were too small to indicate any significant relation. 
This may be partly due to the fact that some readers reached 
their level of comprehension under standard time limits while 
others did not. 

10) When employing time to complete the test as a measure of 
rate, it is probably best to speak of rate of work rather than rate 
of comprehension. Nevertheless, since Blommers and Lindquist? 
find a correlation of .95 between rate of work and their criterion 
of rate of comprehension, the two techniques apparently are 
measuring the same thing. 

11) Analysis of findings indicate that there is a significant 
relation between rate and comprehension in reading. That is, 
the fast reader tends to comprehend better. This tendency 
varies from a slight relationship to a moderately high correlation. 
Factors which appear to affect the size of this correlation include 
nature of the reading task, techniques of measurement, difficulty 
of the textual material, and purpose for which the reading is done. 
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A STUDY IN THE SELECTIVE CHARACTER OF 
AMERICAN SECONDARY EDUCATION: 
PARTICIPATION IN SCHOOL ACTIVITIES AS 
CONDITIONED BY SOCIO-ECONOMIC STATUS 
AND OTHER FACTORS! 


HENRY P. SMITH 
School of Education, Syracuse University 


STATEMENT OF PROBLEM 


The American high school has been developed in the demo- 
cratic tradition of equal educational opportunity for all youth 
irrespective of socio-economic position. Never more than two- 
thirds, and until about fifteen years ago not even a majority, 
of the youth of high-school age have taken advantage of the 
educational opportunities thus provided at the secondary-school 
level. The greater portion of those from the upper strata, socio- 
economically speaking, have continued their schooling through 
high school. On the other hand less than a third of those from 
the lower strata have done so. To a considerable extent, there- 
fore, the democratic ideal has not been realized in the American 
high school. 

The high school, and for that matter the elementary school, 
has been selective in another sense: pupils from all strata of 
society have not shown equal aptitude for learning. As we shall 
see presently, pupils drawn from the lower strata are at con- 
siderable disadvantage in comparison with those drawn from 
the upper strata in the pursuance of the academic courses of the 
curriculum. 

More and more, educators are coming to see the importance 
in the total educative process of the informal learning that takes 
place between pupils. The fact of being a physicalsmember of a 
group is not sufficient. It is necessary to be accepted more or 
less as an equal, to be taken in, in order to profit fully from one’s 
educational opportunities. Pupils who are excluded miss some 
of the most important consequents of school life. One aspect of 
this phase of education, participation in so-called extracurricular 
activities, is here singled out for investigation. That the school 





1 This article is based on the writer’s doctorate dissertation, State Univer- 
sity of Iowa, 1943. Grateful acknowledgement is made to Prof. J. B. 
Stroud under whose direction this work was done. 
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authorities recognize the importance of this phase of education 
is seen in the large number of activities recognized or sponsored 
by the typical high school. In the larger high schools thirty to 
_ forty is not an uncommon number. 

The purpose of this investigation is to determine, within the 
limits of the procedure, the relationship between socio-economic 
status and participation in extracurricular activities at the 
secondary-school level. It seeks to throw some light upon the 
question of whether the American high school is, in addition to 
the two foregoing respects, selective in a third respect; namely, 
in participation in extracurricular activities. 

In addition, as subsidiary problems, the relationship between 
participation in extracurricular activities and certain other 
factors: emotional and social adjustment, introversion-extro- 
version score, and score on vocabulary and achievement tests 
have been investigated. 


RELATED RESEARCH 


Certain problems bearing only indirectly upon the case under 
investigation, but which bear directly upon the larger problem 
of the selective character of American secondary education, are 
here discussed briefly. As is well known there is a positive 
correlation between socio-economic status and test intelligence, 
the modal coefficient being between .40 and .50. From her 
review of the literature Loevinger™ estimated that there is about 
one and a half standard deviation between the mean IQ of 
children drawn from the professional class and the mean of those 
drawn from the unskilled labor class. This amounts to about 
twenty-four Stanford-Binet IQ points. The mean IQ of the 
upper group was placed 1 SD above 100, and that of the lower 
group at .5SD below 100. In their standardization data Terman 
and Merrill? obtained a difference of about twenty points 
between these two groups. 

Holley’ in 1916 and Counts‘ in 1922 noted that while the 
American public school system is operated under a philosophy 
that all youth are entitled to an education at public expense, the 
percentage of youth taking advantage of the opportunities 
offered varies according to the socio-economic status of the 
parents. Counts found a positive correlation between parental 
occupation and high-school attendance. Studies with findings 
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similar to these were made by Goetsch,’ Van Denburg,?' Jordan,"® 
Wessel,?? and Kefauver, Nole and Drake. '? 

Recently much additional emphasis has been given this prob- 
lem in reports by Bell':? of the American Youth Commission’s 
survey in Maryland. Of a sampling of 13,528 cases between the 
ages of sixteen and twenty-four years, one-fifth were in school. 
Of those youth listed as permanently out of school 39.1 per cent 
had not gone beyond the eighth grade; 26.5 per cent had gradu- 
ated from, high school; and 10.7 per cent had attended college. 
The factors listed as being related to continuance in school are: 
race, relief status, sex, size of family, and paternal occupation. 
The probability that a white child will complete the eighth grade 
was found to be twice as large as for a Negro child. Approxi- 
mately the same probability held for a non-relief as opposed to a 
relief child. Forty-five per cent of the boys and thirty-two per 
cent of the girls had failed to go beyond the eighth grade. Of 
the children of professional and technical workers but 7.5 per cent 
failed to go beyond the eighth grade, while of the farm children 
48.1 per cent and of the unskilled laborers 66.1 per cent stopped 
school at the eighth grade or before. Data presented by Kar- 
pinos,'! derived from the 1940 United States Census, are to the 
same purpose. 

Stroud,’® in a recent article, has reviewed the literature per- 
taining to the relationship between socio-economic factors and 
academic achievement, and has presented additional data. He 
concludes that the relationship between measures of socio- 
economic status and school marks is about as close as that between 
socio-economic factors and test intelligence or between intelli- 
gence and school marks. 

It is the belief of many educators that successful participation 
in extracurricular activities, perhaps more than any ether portion 
of the school program, contributes toward the formation of 
desirable personality traits. Koos'* in 1926 summarized the 
values to be obtained from participation in extracurricular activi- 
ties as recognized by forty writers in that field. The most com- 
monly recognized values pertained to social codéperation and 
social leadership. Among the values listed by the writers 
reviewed were experience in group life, training in leadership, 
training for recreation, improving discipline and school spirit, 
maintaining good health, improving scholarship, improving the 
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relation of the school and the community, training for citizenship 
in a democracy, and training for ethical leadership. 

Strang,'* Douglass,> Fretwell,’ and Rugg,” writing since the 
time of the review by Koos, mention values similar to those in his 
list. They emphasize particularly that training in social, moral, 
and civic relationships is.extremely important. 

If it is true that these activities contribute all that the writers 
in the field believe they do, it becomes especially important to 
know what the more important factors associated with participa- 
tion in them are. We should know whether or not the child 
from the unfavored home has as much chance to gain status 
through extracurricular activities as does the child from the 
well-to-do home. Since we already know that the child less 
favored economically is less successful in the classroom than is 
the child from the favored home, we may suspect that such a 
child is more in need of the status-giving values of these activities 
than is the favored child. 

To date there has been comparatively little work done in this 
field. Wright?* has studied the participation in extracurricular 
activities of approximately fifteen hundred students in a high 
school in Portland, Oregon. He classified the occupations of the 
fathers into a six-level scale similar to the Taussig classification. 
His results indicate the presence of a rather close relationship 
between occupational level and participation in extracurricular 
activities, particularly in regard to election to class office, taking 
part in plays, and control of the publications of the school. It is 
this problem with which the present investigation is chiefly 
concerned. 


PROCEDURE 


This study consists of an investigation of factors related to 
participation in extracurricular activities in one large high school. 
This school, the only high school in the city in question, draws 
students from the entire city and from nearby farms and com- 
munities. Data were gathered from seventeen hundred fifty-one 
students in the tenth, eleventh, and twelfth grades. Except for 
absences at the time the data were gathered, this represents the 
entire enrollment in these three grades. 

A questionnaire was submitted to each student for the purpose 
of ascertaining the extent of his participation in extracurricular 
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activities and of gathering such additional information as activi- 
ties in which he would like to have participated, his reasons for 
not participating in them, the distance he lived from school, 
whether or not he was employed at remunerative work, and his 
future occupational interests. 

The Sims Score Card'* was used to obtain an index of socio- 
economic status. The Bell Adjustment Inventory* was adminis- 
tered for purposes of obtaining data on emotional, home and 
social adjustment. 

It was felt that the standing of a pupil on an introversion- 
extroversion scale might be found to be related to extracurricular 
participation. A scale of thirty-five items, selected from the lists 
of Guilford® and Stagner” upon the basis of discriminative value, 
was also administered. Scores made on the Iowa Test of Edu- 
cational Development were also available for most of the students. 

The names of students participating in each activity and the 
extent of their participation as well as data on the costs and func- 
tion of the activities were secured from faculty sponsors. Addi- 
tional information regarding activities was obtained from the 
school newspaper issued ten times each semester. 

Because of an increase in physical and mental maturity from 
grade to grade, participation in certain of the extracurricular 
activities may vary with grade level. Also certain of the activi- 
ties are limited to one or the other of the sexes. This makes it 
necessary to compare the scores obtained by each grade and sex 
on the various tests before attempting to make comparisons 
between the test scores of participants in any one extracurricular 
activity and those of another activity, or between the mean 
scores of participants in an activity and the mean of the school 
population as a whole. If significant changes are observed in test 
scores from grade to grade or if one sex tends to obtain higher 
scores than the other, it would not be legitimate to make com- 
parisons between participants on activities or between partici- 
pants and the general school population, lest any differences found 
be attributable to selection of the members on the basis of grade 
or sex rather than to any selective factor present in the extra- 
curricular activity itself. 

Tests of the significance of the grade to grade and sex differ- 
ences were applied to the mean scores on these various tests. 
The obtained differences between the mean scores on each of the 
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relation of the school and the community, training for citizenship 
in a democracy, and training for ethical leadership. 

Strang,'* Douglass,® Fretwell,® and Rugg,” writing since the 
time of the review by Koos, mention values similar to those in his 
list. They emphasize particularly that training in social, moral, 
and civic relationships is extremely important. 

If it is true that these activities contribute all that the writers 
in the field believe they do, it becomes especially important to 
know what the more important factors associated with participa- 
tion in them are. We should know whether or not the child 
from the unfavored home has as much chance to gain status 
through extracurricular activities as does the child from the 
well-to-do home. Since we already know that the child less 
favored economically is less successful in the classroom than is 
the child from the favored home, we may suspect that such a 
child is more in need of the status-giving values of these activities 
than is the favored child. 

To date there has been comparatively little work done in this 
field. Wright?* has studied the participation in extracurricular 
activities of approximately fifteen hundred students in a high 
school in Portland, Oregon. He classified the occupations of the 
fathers into a six-level scale similar to the Taussig classification. 
His results indicate the presence of a rather close relationship 
between occupational level and participation in extracurricular 
activities, particularly in regard to election to class office, taking 
part in plays, and control of the publications of the school. It is 
this problem with which the present investigation is chiefly 
concerned. 


PROCEDURE 


This study consists of an investigation of factors related to 
participation in extracurricular activities in one large high school. 
This school, the only high school in the city in question, draws 
students from the entire city and from nearby farms and com- 
munities. Data were gathered from seventeen hundred fifty-one 
students in the tenth, eleventh, and twelfth grades. Except for 
absences at the time the data were gathered, this represents the 
entire enrollment in these three grades. 

A questionnaire was submitted to each student for the purpose 
of ascertaining the extent of his participation in extracurricular 
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activities and of gathering such additional information as activi- 
ties in which he would like to have participated, his reasons for 
not participating in them, the distance he lived from school, 
whether or not he was employed at remunerative work, and his 
future occupational interests. 

The Sims Score Card" was used to obtain an index of socio- 
economic status. The Bell Adjustment Inventory* was adminis- 
tered for purposes of obtaining data on emotional, home and 
social adjustment. 

It was felt that the standing of a pupil on an introversion- 
extroversion scale might be found to be related to extracurricular 
participation. A scale of thirty-five items, selected from the lists 
of Guilford® and Stagner” upon the basis of discriminative value, 
was also administered. Scores made on the Iowa Test of Edu- 
cational Development were also available for most of the students. 

The names of students participating in each activity and the 
extent of their participation as well as data on the costs and func- 
tion of the activities were secured from faculty sponsors. Addi- 
tional information regarding activities was obtained from the 
school newspaper issued ten times each semester. 

Because of an increase in physical and mental maturity from 
grade to grade, participation in certain of the extracurricular 
activities may vary with grade level. Also certain of the activi- 
ties are limited to one or the other of the sexes. This makes it 
necessary to compare the scores obtained by each grade and sex 
on the various tests before attempting to make comparisons 
between the test scores of participants in any one extracurricular 
activity and those of another activity, or between the mean 
scores of participants in an activity and the mean of the school 
population as a whole. If significant changes are observed in test 
scores from grade to grade or if one sex tends to obtain higher 
scores than the other, it would not be legitimate to make com- 
parisons between participants on activities or between partici- 
pants and the general school population, lest any differences found 
be attributable to selection of the members on the basis of grade 
or sex rather than to any selective factor present in the extra- 
curricular activity itself. 

Tests of the significance of the grade to grade and sex differ- 
ences were applied to the mean scores on these various tests. 
The obtained differences between the mean scores on each of the 
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various tests and the critical ratios of these differences as 
computed by the formulas est’d oi, = Vom,? + om,? and 


C.R. = My Ss are reported in Table 1. 
est’d o(4,—m:) 


Four of the twenty differences reported in Table 1 are sig- 
nificant at the one-per-cent level and three others are significant 
at the five-per-cent level. If we grant that certain activities 
tend to draw their participants more heavily from one grade or 
sex than from another, then in order to make direct comparisons 
between the mean scores obtained by pupils participating in an 
extracurricular activity and the mean scores of the population 
as a whole we would need to make the assumption that there are 
no grade to grade or sex differences in scores on the particular 
test under consideration. But we can not make such an assump- 
tion. The data presented in Table 1 indicate that both grade 
to grade and sex differences exist in mean scores on at least some 
of the tests. 








TABLE 1.—DIFFERENCES BETWEEN MEANS AND CRITICAL RATIOS 
BY GRADE AND BY SEx 





Grades X | Grades X | Grades XI| Boys and 
and XI and XII | and XII Girls 





Diff | CR | Diff | CR | Diff} CR | Diff | CR 





) 
Soc-E........ .| .53 | 1.61) 1.18) 3.59) .75 | 2.23) .32) 1.19 

















SR nies weasel .10 .26} .47| 1.23) .37 .94) 4.83)/16.71 
Home........ .| .51 | 1.50) .71) 2.13) .20 .58} .90) 3.27 
ees 31 .75} .91) 2.09) .60 | 1.35) .33) .94 
Int-E.........}| .04 14, .27) .94 .31 | 1.04) 1.12) 4.79 














We may, however, determine what the expected mean would 
be for a sample of any combination of individuals drawn from 
a population of this type in which the mean of each of the sub- 
groups (in this case grade and sex) is known. We may also 
compare the mean scores of the individuals in any extracurricular 
activity with this special population mean and determine whether 
or not a hypothesis is tenable that the individuals in that extra- 
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curricular activity may be regarded as a random sample from 
such a population. 

For example, if we know that there are ten eleventh-grade 
boys and twenty twelfth-grade boys belonging to a certain club, 
we may determine what the expected mean socio-economic score 
would be for such a group. To do this we multiply the mean 
socio-economic score of our population of eleventh-grade boys 
by ten and the mean score for the twelfth-grade boys by twenty, 
add the two products and divide by thirty. The result is the 
medn we would expect to obtain (except for errors in sampling) 
in a random sample from the population if our sample consisted 
of ten eleventh-grade and twenty twelfth-grade boys. 

In dealing with the scores of the members of any group we 
may determine the mean and estimated standard error of that 
mean on the particular test or scale under consideration. Using 
My — 
est’d Om 
values within which the population mean of that sample must 
lie with any predetermined degree of confidence and for the 
number of cases in the sample. 

If the population mean for a sample controlled on the basis 
of grade and sex lies outside this confidence interval, we may be 
confident at the level demanded by our hypothesis that the 
difference between the mean scores of the pupils in that extra- 
curricular activity and the mean of the population of the school 
as a whole is not attributable to chance and that some selective 
factor in addition to grade and sex must be operating. 

An analysis was made of the socio-economic and personality 
scores of the thirty-four Negro students in attendance. The 
mean socio-economic score was found to be 14.44 as compared 
with a population mean, controlled for grade and sex, of 17.37. 
The difference of 2.93 with a cw of .99 gives a t of 2.94 which, 
with thirty-three degrees of freedom, is barely significant at the 
one-per-cent level. No significant differences were found 
between the scores of the Negro pupils and the scores of the 
population as a whole on the personality tests. Owing to the 
small number of Negro pupils and the small differences in scores 
these students were not eliminated from either the population 
distribution or from the extracurricular groups studied. 





the formula t = ‘ we may then determine the limiting 
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It will be noted that there is an element of bias in the scores 
on the Sims Score Card when it is used in comparing the socio- 
economic scores of members of fraternities and sororities with 
the scores of members of the population as a whole. In one of 
the questions on this score card the pupil is asked whether he 
belongs to an organization to which he pays dues. Pupils 
answering in the affirmative receive credit for three points 
toward the total raw score. This raw score is later divided by 
the number of questions answered and multiplied by ten to 
remove the decimal point. Many of the sorority and fraternity 
members belonged to other organizations and would have 
received credit from the item in any event. The average bias 
arising from this source for the groups of fraternity and sorority 
members appears to be less than one point on the final socio- 
economic score. 

As used here participation means number of activities. It is 
recognized that there are differences in quality as well as in 
quantity of participation, but quality is not considered to be 
within the scope of the present study. 


RESULTS 


Socio-Economic Status and Extracurricular Participation.—In 
accordance with the procedure outlined in the preceding section, 
an analysis was made of the socio-economic scores obtained by 
members of each of the various extracurricular organizations. 
The mean of these scores, the population mean adjusted for 
grade and sex, the difference between the sample and population 
mean, the value of ¢, and the value of ¢ necessary for significance 
at the one-per-cent level are presented in Table 2. 

The differences in socio-economic scores between students 
participating in extracurricular activities and the students in 
the school population as a whole are probably the most striking 
features of the data obtained in this study. Almost without 
exception the mean socio-economic score of the students par- 
ticipating in extracurricular activities is higher than the adjusted 
mean score of the school population as a whole. In most cases 
the differences are statistically significant at well beyond the 
one-per-cent level. In but three activities does the mean socio- 
economic score fall below the population mean and none of these 
three negative deviations are statistically significant. 
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TABLE 2.—MErEAN Scores oF STUDENTS 


PARTICIPATING IN 


VARIOUS EXTRACURRICULAR ACTIVITIES COMPARED WITH THE 
ScHooL MEAN ON THE Basis or Socio-Economic STATUS 


Activity 
Pootbell.: 2. i. ss's% 


Be eae ig, 
SRE ae 
Basketball.......... 
Delta S Sor......... 
Delta U Bor.......+. 
Kappa D. Frat...... 
Delta Th Frat...... 
Alpha Z Frat........ 
Spanish Club....... 
German Club....... 
French Club........ 
BR, go cas we 
Dramatic Club..... 
NFL (Debate)...... 
Ss aca sx ator 
Normal Club....... 
Jane Addams Club.. 
Girls Ath Assn..... é 
Renae 
Student Club....... 
Quill Comb. ........: 


Publications........ 
Leadership......... 
ROTC Officers...... 


Secy-Treas......... 


32 
156 


Sample 


Mean 
18.52 
19.98 
16.41 
18.78 
20.43 
18.91 
18.84 
27 .00 
23 .50 
27.14 
25.94 
24.00 
22.11 
21.84 
24.70 
21.48 
22.00 
22.79 
21.18 
16.81 
17.25 
18.26 
22.78 
21.28 
20.86 
21.76 
18.59 
24.03 
19.86 
19.79 
19.75 


Pop 
Mean 


18. 
.95 
. 89 
17. 
.94 
.82 
.65 
.52 
17. 
.02 


17 
17 


17 
17 
17 
17 


18 


17. 


18 
17 
17 
17 
17 
17 


17 
17 
17 
17 
17 
17 


17 
17 
17 


07 


84 


90 


88 


.08 
.69 
85 
91 
34 
79 
18. 
.46 
a1 
.64 
.43 
.96 
44 
17. 
.87 
43 
.97 
17. 
18. 
17. 


15 


61 


73 
54 
71 


Diff 


2 


Orr POW OL wWOo oOo & dO 


Ne NOR WWW 


45 
.03 
.48 
.94 
.49 
.09 
.19 
.48 


.12 
.06 
.92 
.42 
.99 
79 
14 
.21 
. 64 
72 
.90 
.39 
.83 
.82 
. 84 


bo’ 
Lo | 


.89 
.16 
.06 
13 
.25 


io) 
ves 


Value 
of t 


2 
1 


] 


— — 
a ee ee ee 


Om BON OW OO =I = 


37 


.38 


.48 


41 
. 60 
2. 


67 
91 


54 


.83 


.40 


.89 
.10 
.95 


. 59 
44 


.90 
.68 
51 
. 60 
.07 
.43 
.95 
91 
.53 
.O1 
51 


.82 


.70 
.97 


.45 
.70 


t sig 
1 per 
cent 
level 
2.82 
2.68 
2.78 
3.36 
3.71 


3.01 


oOo or 


or oo 


Nor WOWwWWwomWd ou © Ww 
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In order to portray more fully the selective character of the 
activities the per cent of students earning various socio-economic 
scores was calculated for the school as a whole and for participants 
in certain extracurricular activities. Table 3 shows the result 
of this tabulation for the combined fraternity and sorority 
membership, the boys Hi-Y organization, and the corresponding 
girls organization known in this school as the Student Club. 

The combined fraternity and sorority groups were chosen for 
this further analysis because Table 2 indicates that they are the 
most selective groups studied, and the Hi-Y and Student Club 
were chosen because they are open to all high-school pupils who 
care to join. 


TABLE 3.—PER CENT OF THE TOTAL MEMBERSHIP OF VARIOUS 
Groups Founp 1n Eacu Socio-Economic CATEGORY 


Entire Fraternity Student 
Sims Score School Sorority Hi-Y Club 
33-34 .40 5.33 3.70 1.20 
31-32 .57 5.33 .00 .60 
29-30 2.51 21.33 5.56 4.79 
27-28 3.77 20 .00 9.26 10.18 
25-26 4.74 9.33 12.96 13.17 
23-24 9.19 18 .67 22.22 13.77 
21-22 8.85 12.00 20 .37 11.38 
19-20 12.73 5.33 12.96 16.77 
17-18 15.25 2.66 3.70 10.18 
15-16 11.82 .00 5.56 7.78 
13-14 11.88 .00 .00 4.19 
11-12 7.42 .00 3.70 4.19 
9-10 5.94 .00 .00 .60 
7-8 2.91 .00 .00 .60 
5-6 1.48 .00 .00 .60 
3-4 .46 .00 .00 .00 


From Table 3 it will be noted that while the Hi-Y and Student 
Club are not so highly selective as are the fraternities and sorori- 
ties, they are at the same time more selective than they are 
commonly intended to be. In the case of the fraternities and 
sororities, ninety-two per cent of the membership is found to be 
drawn from the upper thirty per cent of the school population 
and one hundred per cent of the membership is drawn from the 
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upper fifty-three per cent of the school. The Hi-Y organization 
drew seventy-four per cent of its members from the upper thirty 
per cent and ninety-one per cent from the upper fifty-three per 
cent of the school population, while the Student Club drew fifty- 
five per cent and seventy-two per cent, respectively, from these 
two categories. 

In order to present a further analysis of the selectivity of 
extracurricular activities and to obtain data regarding certain 
other factors not easily treated under the procedure of con- 
trolling the population mean for grade and sex, a rather intensive 
study was made of the extracurricular participation of twelfth- 
grade boys. This group was chosen because extracurricular 
participation appears to be most general at this grade level and 
because more activities appear to be open to boys than to girls. 

To make this further analysis a record was made of all the 
activities participated in by each twelfth-grade boy during the 
current school year. The entire group was then divided into 
four categories: those who participated in no activities, those 
who participated in one, in two, and in three or more activities. 

Table 4 shows the results of this analysis in terms of the num- 
ber of boys in each category, the mean socio-economic score for 
each category, the standard deviations, and the estimated 
standard error of the mean. The trend toward higher mean 
score with increased participation is very marked. All the 
differences are significant at the one-per-cent level except the 
difference between the participants in two activities and those 
who participated in three or more. 


TABLE 4.—Socio-Economic Status oF TWELFTH-GRADE Boys 
PARTICIPATING IN VARYING NUMBERS OF EXTRACURRICULAR 


ACTIVITIES 
Number of Mean SDof Estimated 
Number of Activities Cases Score Sample oM 
No Activities......... 121 16.50 4.63 .423 
One Activity......... 5; 74 18.49 5.71 .668 
Two Activities........ 46 21.46 5.18 .772 
Three or More........ 30 22.47 4.34 .806 


Emotional Adjustment and Extracurricular Participation.— 
It will be noted that scores on all sections of the adjustment 
inventory are reported in terms of maladjustment. That is to 
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say, a low score is to be considered more desirable than a high 
score. No significant differences were obtained between the 
mean scores of students participating in various activities and 
their mean on the emotional adjustment test. There is, however, 
a rather marked sex difference in the scores on this test. The 
mean maladjustment score for the girls was 12.38 while the mean 
for the boys was 7.55. The mean score for the school population 
as a whole was 10.14. 

These differences serve to emphasize the necessity for a con- 
sideration of the makeup of our sample in terms of sex before 
comparing the sample mean with an unadjusted population 
mean. Had this not been done most of the extracurricular 
activities catering entirely to boys would have shown good 
emotional adjustment while samples made up of girls would have 
shown poor adjustment. 

Home Adjustment and Extracurricular Participation.—No 
significant differences were found between the mean home mal- 
adjustment scores of students participating in the various extra- 
curricular activities and the mean scores of the school population. 

Social Adjustment and Extracurricular Participation—Some 
rather marked differences were obtained in social maladjustment 
scores between the mean scores for members of many of the 
extracurricular activities and the mean scores of the population 
of the school as a whole. In all but three cases the members of 
the extracurricular activities earned a more favorable social 
adjustment score than did the school population. None of the 
three deviations from this general tendency were statistically 
significant at the one-per-cent level while fifteen of the deviations 
toward better adjustment were statistically significant. The 
groups whose members showed superior adjustment were: 
Tennis, German Club, Dramatic Club, Debate, Normal Club, 
Hi-Y, Student Club, Quill Club, Publications, Leadership, ROTC 
Officers and one sorority and three fraternities. 

Table 5 compares the mean social maladjustment scores of 
twelfth-grade boys categorized upon the basis of the amount of 
their participation in extracurricular activities. All the dif- 
ferences between scores presented in Table 5 are significant at 
the one-per-cent level except the difference between those who 
participated in one activity and those participating in two 
activities. 
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Introversion-Extroversion and Extracurricular Participation.— 
Only two of the thirty-one activities studied—Hi-Y and the 


TABLE 5.—SoctaAL MALADJUSTMENT Scores FOR TWELFTH- 
GRADE Boys PARTICIPATING IN VARYING NUMBERS OF EXTRA- 


CURRICULAR ACTIVITIES 
Number of Mean SDof Estimated 


Number of Activities Cases Score Sample om 
No Activities......... 121 14.79 6.84 .62 
One Activity......... 74 11.81 7.67 .90 
Two Activities........ 46 10.39 7.73 1.15 
Three or More........ 30 6.03 4.82 .89 


Girls Athletic Association—showed mean scores significantly 
different from the population on the thirty-five item introversion- 
extroversion test. These differences were in the direction of 
extroversion. These two differences were significant at approxi- 
mately the .1 of one-per-cent level if considered alone, but when 
considered as the highest differences in a group of thirty dif- 
ferences to which the ¢ test had been applied could not be con- 
sidered significant with anything like that degree of confidence. 
Vocabulary and Extracurricular Participation.—Scores obtained 
on the Iowa Test of Educational Development were available 
for 201 of the 271 boys in the twelfth grade. A comparison of 
the mean scores obtained by twelfth-grade boys on the vocabulary 
section of this test was made between categories determined on 
the basis of extent of extracurricular participation. The results 
obtained from this comparison are presented in Table 6. 


TABLE 6.—VOCABULARY SCORES FOR TWELFTH-GRADE Boys 
PARTICIPATING IN VARYING NUMBERS OF EXTRACURRICULAR 


ACTIVITIES 
Number of Mean SDof Estimated 
Number of Activities Cases Score Sample om 
No Activities......... 92 15.92 4.90 51 
One Activity.......... 53 19.19 4.30 .60 
Two Activities........ 34 20.29 4.31 .75 
Three or More........ 22 20.77 3.84 .84 


The differences in mean scores between those who did not 
participate and the three categories of participants are all sig- 
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nificant at well beyond the one-per-cent level. The differences 
between those who participated in varying numbers of activities 
are not significant at the one-per-cent level for this test. 

Educational Development and Extracurricular Participation.— 
Table 7 shows the result of a comparison of scores made on the 
Iowa Test of Educational Development. All the differences in 
mean scores between those participating in no activities and 
those participating in one, two, three or more activities are 
significant at the one-per-cent level. None of the differences 
between participants in one, two, three or more activities are 
significant at that level although the trend in all cases is toward 
higher mean score with increase in amount of participation. 

Distance from School and Extracurricular Participation.—An 
analysis was made of the amount of participation of 269 twelfth- 
grade boys according to the distance of their homes from the 
high school. Of those who participated in no activities approxi- 
mately forty per cent lived three or more miles from school and 
thirty-eight per cent lived less than two miles from school. In 
the case of the students participating in two or more activities 
approximately twenty-six per cent lived three or more miles from 
school and forty-five per cent lived less than two miles from 
school. 

Socio-Economic Score and Adjustment Scores.—Pearson prod- 
uct-moment correlation coefficients were computed between the 
scores obtained on the Sims Score Card and the three tests of 
adjustment and the extroversion score. These coefficients are 
presented in Table 8. 


TABLE 7.—ToTAL ScORE ON THE Iowa TEsT OF EDUCATIONAL 
DEVELOPMENT FOR TWELFTH-GRADE Boys PARTICIPATING IN 
VARYING NUMBERS OF EXTRACURRICULAR ACTIVITIES 

Number of Mean SDof Estimated 


Number of Activities Cases Score Sample oM 
No Activities......... 92 16.96 4.98 .52 
One Activity......... 53 20.36 4.83 .67 
Two Activities........ 34 20.94 5.35 .93 
Three or More........ 22 21.32 5.21 1.14 


The obtained coefficients are low except in the case of socio- 
economic status and social maladjustment. They are, however, 
consistent as to direction and are based on rather large samples. 
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Mode of Transportation and Extracurricular Activities —From 
answers to the questionnaire it was possible to determine the 
most common mode of transportation available for one hundred 
fifty of the two hundred seventy-one twelfth-grade boys. The 
types of transportation were grouped into two categories: inde- 
pendent, which included walking, riding a bicycle, city bus, or 
driving a car; and dependent, which included riding in a school 
bus or in a car driven by others. Of the boys participating in 
no activities eighty-five per cent were independent in regard to 
transportation while of those participating in two or more 
activities ninety-two per cent were classed as independent. 


THE SELECTIVE CHARACTER OF AMERICAN SECONDARY EDUCATION 


Previous investigations have shown that students of low socio- 
economic status tend to leave school sooner than do those of 
higher socio-economic status, and that they fail to gain as much 
from the pursuance of the curriculum. The present investiga- 
tion confirms Wright in showing that pupils drawn from unfav- 
ored homes do not participate so liberally in the extracurricular 
functions of the school. 


TABLE 8.—CORRELATION BETWEEN Socio-Econom’: Scores 
AND ScorRES ON TESTS OF ADJUSTMENT AND EXT VERSION 


Emotional Home Social wxtro- 
Mal. Mal. Mal. version 
Boys 
(NW 812).......... —.@ — .12 — .40 .19 
Girls 
|) ar) —.14 — .35 .09 
Total 
OW TFG). i... i cea ee —.14 = a7 13 


Although Wright?* did not apply tests of significance in his 
investigation of the selective nature of extracurricular participa- 
tion, these data agree for the most part with his findings. 

In view of the data presented here and the findings of other 
investigators cited previously, it appears legitimate to conclude 
that despite the American educational philosophy of an equal 
opportunity for all, children of the lower socio-economic groups 
drop out of school sooner and gain less from both the curricular 
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and extracurricular functions of the school while in attendance 
than do children from the higher socio-economic levels. 


CONCLUSIONS 


The purpose of this study was to determine the extent to 
which participation in extracurricular activities was conditioned 
by socio-economic ‘status, personality traits, and other factors 
in the case of a group of high-school students. 

The data obtained in this study appear to justify the following 
conclusions regarding the students studied: 

1) Extracurricular activities with but few exceptions tend to 
be selective in terms of socio-economic status. In twenty-eight 
of the thirty-one extracurricular groups studied the mean score 
on the Sims Score Card was higher than the mean score for the 
school population as a whole adjusted for the sex and school 
grade of the participants in each activity. Twenty of these 
differences in socio-economic status are significant at the one- 
per-cent level. 

2) Students participating in extracurricular activities show a 

definite tendency to be superior to non-participants in social 
adjustment score as measured. 
. Twenty-eight of the thirty-one extracurricular groups studied 
obtained more favorable social adjustment scores. Fifteen of 
these differences in mean score between the extracurricular 
groups and the population adjusted for sex and grade are sig- 
nificant at the one-per-cent level. 

3) Students participating in extracurricular activities tend 
to be superior in scores on a vocabulary test and in scores on the 
Iowa Tests of Educational Development. 

4) Students participating in extracurricular activities tend to 
live closer to school than do non-participants. 

5) There appears to be some relationship between socio- 
economic score and scores on tests of emotional, home, and 
social adjustment and extroversion for both boys and girls. 
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NOTE ON A REVISED BLOCK DESIGN TEST AS A 
MEASURE OF ABSTRACT PERFORMANCE 


WILLIAM GOLDFARB 
Foster Home Bureau, New York Association for Jewish Children 


The Block Design Test was originally standardized by Kohs as 
an individual test of intelligence. It was later adapted and 
incorporated in the Wechsler-Bellevue Intelligence Scale.° 
In Wechsler’s correlations of each of the Bellevue Scale subtests 
with the total scale, the Block Design Test showed the highest 
correlation among the performance tests and one of the highest 
among all the subtests. Wechsler thus felt the test to be the 
best single performance item in his series. Similarly, in corre- 
lations between the Bellevue subtests and two other measures of 
intelligence (Otis, 20’° and CAVD), the Block Design Test 
showed itself to be the best measure of intelligence among the 
performance subtests.’ 

The test is thus acceptedly a good measure of general intelli- 
gence. There is now extensive clinical and experimental experi- 
ence with the test which would indicate that its value is derived 
from the efficacy with which it measures the higher forms of 
synthetic or analytic ability. This is demonstrated in its 
particular sensitivity to the absence or deterioration of the ability 
to generalize. Indeed, Goldstein and others!” have adapted 
the test as a device for determining whether the individual’s 
behavior is in accord with the ‘abstract’ or ‘concrete’ attitudes. 
With the test, they were able to demonstrate the concre- 
tistic approaches of schizophrenics and patients with cortical 
mal-functioning. 

The fullest description of the psychology of the ‘abstract 
attitude’ as expressed in the Block Design Test is given by 
Goldstein and Scheerer.‘ The Goldstein-Scheerer Cube Test is 
of definite value in the clinical evaluation of the extent to which 
the abstract attitude is maintained. There is no doubt, how- 
ever, that the individual’s level of conceptualization will also 
express itself both in the qualitative performance and quantita- 
tive rating either in the Kohs Block Test or in the Wechsler- 
Bellevue Block Design Test. In addition, the more clear-cut 
quantitative rating permitted by the latter tests would insome 
respects make them better research instruments than the Gold- 
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stein-Scheerer Cube Test. Further, since there is value in 
securing a global measure of intelligence, one can at one and the 
same time employ the test as a subitem in an intelligence scale, 
as in the Wechsler-Bellevue Scale, and also as a focused measure 
of abstract ability. When, as in the Wechsler-Bellevue Scale, 
it is administered as one of a variety of serial tests, the results 
also permit an analysis of test pattern. The pattern analysis 
in itself would strengthen any clinical or qualitative impression 
one might develop regarding the subject’s ability to face prob- 
lems along conceptual lines. 

In an experimental investigation of the effects of institutional 
deprivation as expressed in an adolescent group, the Wechsler 
Block Design Test was employed as a measure of concept for- 
mation along with the Wechsler Similarities Test, the Weigl 
Color Form Test, the Vigotsky Test, the Rorschach Examination 
and the Graphic Rorschach Examination.* In scoring the Weigl 
and Vigotsky Tests, the Zubin-Thompson'® directions for 
administration and scoring were followed, as these were found 
to be the most explicit and the most feasible for research purposes. 
All the tests were found to discriminate the experimental groups 
clearly. It was also our impression that all the tests had definite 
utility and were feasible instruments for the delineation of 
the abstract attitude among adolescents, ten years of age and 
over. 

Our present interest is in an adaptation of the Wechsler Block 
Design Test which was introduced in the deprivation experiment 
noted above. In initiating the experiment, it was our belief 
that the Wechsler Test had the following limitations as a measure 
of abstract performance: 

1) It is a timed test. It gives credit for speedy performance 
and may actually penalize the individual who is slow although 
he is capable of abstract activity. In any case, the question of 
reaction speed appears irrelevant in an investigation of abstract 
performance. 

2) It only gives credit for perfect reproduction of design. Its 
scoring is not based on a consideration of the varying levels of con- 
ceptual performance attained in each of the design reproductions. 

The following adaptation of the Wechsler Block Design Test 
was, therefore, devised. We shall refer to this adaptation as the 
Revised Block Design Test. The time limit for each card was 
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extended to five minutes. Each of the design reproductions 
was scored in the following scale: : 

Credit 0. Completely random grouping. There is no reflec- 
tion or analysis at all. The subject is passively guided by 
impulse alone, does not even observe color sequence, and is 
willing to accept the patently irrelevant product. 

Credit 1. The configuration is vaguely perceived and inade- 
quately analyzed. There is reaction to color sequence. There 
is apparent effort at analysis and, at times, perplexity in reaction 
to the poor results. 

Credit 2. (a) The configuration is correctly analyzed and 
resynthesized except that, because of impulse, one block is 
incorrectly placed. (b) Although incomplete, more than half of 
a design is correctly reproduced within the five-minute time 
limit. 

Credit 3. Complete design correctly reproduced within the 
five-minute time limit. 

The total score is the sum of credits on all the designs. The 
test consists of seven designs and, according to our method of 
scoring, the maximum score is 21. 

Tests were administered to thirty adolescents, including 
sixteen boys and fourteen girls at the mean age of twelve years 
and three months (sigma 13.6 months), range ten years, six 
months to fourteen years, one month. In addition to the above 
described revision of the Block Design Test, each child received 
ratings in the original Wechsler Block Design Test, the Wechsler 
Similarities Test, the Vigotsky Test and the Weigl Test. In 
Table I are the correlations between the Wechsler and Revised 
Block Design Tests and the other tests of abstract performances. 


TABLE I. 
Wechsler Blovk Revised Block 
Test Design Test Design Test 
Wechsler Block Design Test... .90 
Wechsler Similarities Test..... . 56 .65 
Vigotsky (Zubin-Thompson). .. 47 .60 
Weigl (Zubin-Thompson)...... .52 .57 


All the correlations are significant at the one-per-cent level. 
(See Lindquist,* Table 13.) Both Block Design Tests correlate 
very highly with each other. This is not completely unexpected, 
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inasmuch as the correlated ratings are based on the same test 
performance and also because the question of slow reaction speed 
is not a serious one among adolescents. Both Block Design 
Tests also show uniformly significant correlations with the three 
other tests of abstract performance. It may be assumed that 
both tests offer equally good measures of the ability to generalize. 
There is only a suggestive tendency for the Revised Block 
Design Test to show higher correlations with other criteria of 
abstract ability. In the adolescent group studied, it is thus 
true that the Wechsler Block Design Test and the Revised 
Block Design Test offer equally good quantitative measures for 
purposes of group comparison. 

The following case illustration from a report by Reynell® is 
presented, however, as an example of a clinical problem in which 
the Revised Block Design Test might have been of contributory 
value. ° 


Pte. C. T.—sustained a head injury in motorcycle accident. Post 
traumatic amnesia four days. No fracture. His M. O. reported ‘very 
poor intelligence; it is impossible to get a reasonable history from 
him ....’ On Kohs Blocks he did all designs without difficulty but 
took three times the standard time in doing them. This was only four 
weeks after his accident, and after another four weeks he showed no 
retardation and no intellectual loss on the differential test. 


Psychomotor retardation has been observed as one of the 
most common consequences of traumatic cerebral impairment. 
On the other hand, Reynell* makes the observation that this 
retardation may be of brief duration and may actually give a 
false picture of impairment if the psychological examination is 
superficial and if, by implication, the level of abstract per- 
formance is not adequately measured because of the inclusion of 
the speed factor. The Revised Block Design Test is thus offered 
as a method of assaying abstract ability in groups where there 
has been significant retardation of reaction speed. These would 
include, for example, aged adults and individuals with brain 
damage. 
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RELATIONSHIP BETWEEN THE INDICES OF 
INTELLIGENCE DERIVED FROM THE 
KUHLMANN-ANDERSON INTELLIGENCE TESTS 
FOR GRADE I AND THE SAME TESTS FOR 
GRADE IV* 


MILDRED M. ALLEN 
School Psychologist, New Rochelle, New York, Public Schools 


The problem was a study to determine how scores obtained 
on the Kuhlmann-Anderson Intelligence Tests in Grade I are 
related to corresponding scores on the Kuhlmann-Anderson 
Intelligence Tests in Grade IV, and to determine to what extent 
scores in Grade IV are predicted by scores in Grade I. 

The subjects used in this study were three hundred and twenty- 
seven pupils from ten elementary schools in New Rochelle, 
New York. The Kuhlmann-Anderson Intelligence Tests were 
first administered in February, 1937, when the subjects were 
midway through the first grade. The Kuhlmann-Anderson 
Intelligence Tests, fourth-grade battery, were administered to 
the same pupils near the beginning of Grade IV in October, 1939. 


RESULTS AND INTERPRETATION 


The relationships between measures obtained from the Kuhl- 
mann-Anderson Intelligence Tests in Grade I, and the same test 
in Grade IV are shown in Table I. 

Table I indicates that MA in Grade IV is related about equally 
to either MA, IQ, or Pe. Av. obtained in Grade I, with corre- 
lation coefficients of .51, .51, and .52, respectively. Only a 
moderate degree of relationship is found as correlations of 
this size have corresponding coefficients of alienation (k)! of 
.8602, .8602, and .8542, and indices of forecasting efficiency of 
E = 13.92, 13.92, and 14.58, respectively. Thus errors of 
predicting Kuhlmann-Anderson MA in Grade IV, from Kuhl- 
mann-Anderson MA, IQ, or Pe. Av., in Grade I, are about 
fourteen per cent less than they would be if no correlation existed 





* Part of a study for a Doctor’s dissertation completed at New York 
University, Graduate School of Education, 1940. 

1k = Coefficient of Alienation, and FE = Index of Forecasting Efficiency. 
J. P. Guilford, Psychometric Methods, McGraw-Hill Book Co., Inc., 1936, 


p. 362. 


252 











Kuhlmann-Anderson Tests in Grade I and Grade IV 253 


between the original and subsequent measures. This may be due 
to differences in the tests themselves, i.e., non-verbal content 
material of the tests for Grade I, versus the inclusion of more 
tests of a verbal nature in the battery for Grade IV. Incon- 
stancy in mental growth may be another factor affecting this 
relationship. 


TABLE I.—COEFFICIENTS OF CORRELATION EXPRESSING THE 
RELATIONSHIPS BETWEEN INDICES OF INTELLIGENCE 
DERIVED FROM THE KUHLMANN-ANDERSON 
‘ INTELLIGENCE TESTS FOR GRADE I AND THE 

SAME TEsT FoR GraDE IV 











Kuhlmann-Anderson Intelli- 
gence Tests Grade I 
Grade IV 
MA IQ Pe. Av.* 
a ery ere epee 51 51 .52 
Da ns tewnu eehlest udeentate .43 .69 .69 
MR ii oe ee EE a .39 .65 .65 














* The Pc. Av., or per cent of average development, is an index obtained by 
dividing an individual’s mental unit points by the average mental unit 
points for his age group, mental units being determined by conversion to 
a point scale designed by Heinis. The Pc. Av. is preferred by Kuhlmann to 
the IQ, since it is more constant for retests over a period of years. 


The prediction of Kuhlmann-Anderson IQ in Grade IV from 
Kuhlmann-Anderson MA, IQ, or Pe. Av., in Grade I differs 
somewhat from the situation above. The correlation between 
Grade I MA and Grade IV IQ is .43 or .08 lower than the coeffi- 
cient of .51 between Grade I MA and Grade IV MA. This 
difference of .08 is only 1.4 times as great as the standard error 
of the difference, and thus lacks statistical significance. The EZ 
value for the correlation of .43 is 9.72 (k = .9028). This is not 
much better than the £ of 13.92 for the coefficient between the 
first- and fourth-grade mental age. 

The fourth-grade Pe. Av. is related to first-grade MA by a 
coefficient of correlation of .39. This is practically the same as 
the .43 for MA and IQ. The apparent difference of .12 between 
the coefficient of .51 for the first and the fourth grades is 2.4 
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times its standard error and, although somewhat more reliable 
than the difference of .08 mentioned above, is still far short of a 
truly significant difference. The EZ for an r of .39 is 7.92 (k = 
.9208) indicating a reduction of errors of prediction of almost 
eight per cent. It appears that neither MA, IQ, nor Pe. Av. 
obtained at the beginning of Grade IV are very adequately 
predicted from MA obtained in Grade I. 

Further inspection of Table I seems to indicate a different 
situation with respect to prediction of IQ and Pc. Av. from the 
same measures obtained in Grade I. As mentioned previously, 
neither differs from MA in so far as prediction of Grade IV MA 
is concerned. There is, however, a highly significant difference 
between the predictive efficiency of MA and IQ or Pc. Av. when 
IQ or Pc. Av. in Grade IV is the criterion (i.e., is being predicted). 
Whereas, Grade I MA and Grade IV IQ are related by .43, the 
coefficient between Grade I IQ or Pe. Av. and Grade IV, IQ is 
.69. The difference of .26 is 5.2 times its standard error (when 
r= .43, k = .9028, and HE = 9,72; whereas, when r = .69, 
k = .7238, and EF = 27.62). Thus, it appears that in terms of 
predictive efficiency, prediction of Grade IV IQ is about three 
times as high when predictions are made from Grade I IQ or 
Pc. Av. as when made from Grade I MA. 

Since the MA, IQ, and Pe. Av. on the Kuhlmann-Anderson 
Intelligence Test obtained at the beginning of Grade IV are not 
adequately predicted from the same scores in Grade I, it appears 
that an intelligence test administered at approximately the same 
time as an achievement test has greater significance for prediction 
purposes than one administered at an earlier date. 


SUMMARY 


An analysis of the relationship between the indices of intelli- 
gence (MA, IQ, and Pe. Av.) on the Kuhlmann-Anderson 
Intelligence Test in Grade I, and on the same test in Grade IV 
indicates that neither MA, IQ, nor Pc. Av. obtained at the 
beginning of Grade IV are adequately predicted from the same 
scores in Grade I. The low relationship between the three 
indices of intelligence, namely, the MA, IQ, and Pc. Av. in the 
first grade and the beginning fourth grade, as based on the present 
test, is not significant for predicting mental ability at the begin- 
ning of fourth grade. This low relationship may be due to the 
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difference in content material of the two batteries; namely, the 
inclusion of verbal tests in the fourth-grade battery in contrast to 
the first-grade battery which is composed of all non-verbal tests. 

The predictive efficiency of the IQ and Pc. Av. based on the 
Kuhlmann-Anderson Intelligence Test for Grades I and IV is 
relatively the same. 

The validity of long range predictions from a group intelligence 
test given at an early age (first grade) is highly questionable. 
The content material of most first-grade group intelligence tests 
is practically all non-verbal, since entering first-grade pupils 
have not learned to read. As pupils advance through the grades, 
it appears that a group intelligence test with content material of 
a verbal nature has greater validity for predicting successful 
achievement in the tool subjects when administered at approxi- 
mately the same time as an achievement test. Such a combi- 
nation of testing is, also, significant in determining whether 
pupils are working up to their ability. 
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